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PREFACE 

THE  subject  matter  of  this  book  was  first  broached  in  the  brain 
of  Leibniz,  who,  in  the  dissertation,  written  in  his  twenty-third 
year,  on  the  mode  of  electing  the  kings  of  Poland,  conceived 
of  Probability  as  a  branch  of  Logic.  A  few  years  before,  "  un 
probleme,"  in  the  words  of  Poisson,  "  propose  a  un  austere 
janseniste  pur  un  homme  du  inonde,  a  etc*  1'origine  du  calcul 
des  probabilites."  In  the  intervening  centuries  the  algebraical 
exercises,  in  which  the  Chevalier  de  la  Mere  interested  Pascal, 
have  so  far  predominated  in  the  learned  world  over  the  pio- 
i'ounder  enquiries  of  the  philosopher  into  those  processes  of 
human  (acuity  which,  by  determining  reasonable;  preference, 
guide  our  choice,  that  Probability  is  oftener  reckoned  with  Mathe 
matics  than  with  Logic.  There  is  much  here,  therefore,  which  is 
novel,  and,  being  novel,  unsifted,  inaccurate,  or  deficient.  I 
propound  my  systematic  conception  of  this  subject  for  criticism 
and  enlargement  at  the  hand  of  others,  doubtful  whether  I 
myself  am  likely  to  get  much  further,  by  waiting  longer, 
with  a  work,  which,  beginning  as  a  Fellowship  Dissertation, 
and  interrupted  by  the  war,  has  already  extended  over 
many  years. 

It  may  be  perceived  that  I  have  been  much  influenced  by 
W.  E.  Johnson,  (J.  K.  Moore,  and  Pertrand  Russell,  that  is 
to  say,  by  Cambridge,  which,  with  great  debts  to  the  writers 
of  Continental  Europe,  yet  continues  in  direct  succession 
the  English  tradition  of  Locke  and  Berkeley  and  Hume,  of 
Mill  and  Sidgwick,  who,  in  spite  of  their  divergences  of 
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doctrine,  are  united  in  a  preference  for  what  is  matter  of 
fact,  and  have  conceived  their  .subject  as  a  branch  rather  of 
science  than  of  the  creative  imagination,  prose  writers,  hoping 
to  be  understood. 

J.   M.   KEYNES. 

KINO'S  COLLEGE,  CAMHRIDGE, 
Mt.y  1,  1920. 
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CHAPTER    I 

THE    MEANING    OF    PROBABILITY 

".J'ai  dit  plus  d'une  fois  qu'il  fauclrait  uno  nouvellc  espeee  do  logiqiu-,  <jui 
traiteroit  des  degres  de  Probabilite." — LEIBMZ. 

1.  PART  of  our  knowledge  we  obtain  direct  ;  and  part  by 
argument.  The  Theory  of  Probability  is  concerned  with  that 
part  which  we  obtain  by  argument,  and  it  treats  of  the  different 
degrees  in  which  the  results  so  obtained  are  conclusive  or  in 
conclusive. 

In  most  branches  of  academic  logic,  such  as  the  theory  of  the 
syllogism  or  the  geometry  of  ideal  space,  all  the  arguments  aim 
at  demonstrative  certainty.  They  claim  to  be  conclusive.  "But 
many  other  arguments  are  rational  and  claim  some  weight  with 
out  pretending  to  be  certain.  In  Metaphysics,  in  Science,  and  in 
Conduct,  most  of  the  arguments,  upon  which  we  habitually  base 
our  rational  beliefs,  are  admitted  to  be  inconclusive  in  a  greater 
or  less  degree.  Thus  for  a  philosophical  treatment  of  these 
branches  of  knowledge,  the  study  of  probability  is  required. 

The  course  which  the  history  of  thought  has  led  Logic  to  follow 
has  encouraged  the  view  that  doubtful  arguments  are  not  within 
its  scope.  But  in  the  actual  exercise  of  reason  we  do  not  wait 
on  certainty,  or  deem  it  irrational  to  depend  on  a  doubtful 
argument.  If  logic  investigates  the  general  principles  of  valid 
thought,  the  study  of  arguments,  to  which  it  is  rational  to  attach 
some  weight,  is  as  much  a  part  of  it  as  the  study  of  those  which 
are  demonstrative. 

2.  The  terms  certain  and  probable  describe  the  various  degrees 
of  rational  belief  about  a  proposition  which  different  amounts  of 
knowledge  authorise  us  to  entertain.  All  propositions  are  true 
or  false,  hut  the  knowledge;  we  have  of  them  depends  on  our 
circumstances ;  and  while  it  is  often  convenient  to  speak  of 
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propositions  as  certain  or  probable,  this  expresses  strictly  a 
relationship  in  which  they  stand  to  a  corpus  of  knowledge,  actual  or 
hypothetical,  and  not  a  characteristic  of  the  propositions  in  them 
selves.  A  proposition  is  capable  at  the  same  time  of  varying  degrees 
of  this  relationship,  depending  upon  the  knowledge  to  which  it  is 
related,  so  that  it  is  without  significance  to  call  a  proposition  prob 
able  unless  we  specify  the  knowledge  to  which  we  are  relating  it. 

To  this  extent,  therefore,  probability  may  be  called  sub 
jective.  But  in  the  sense  important  to  logic,  probability  is  not 
subjective.  It  is  not,  that  is  to  say,  subject  to  human  caprice. 
A  proposition  is  not  probable  because  we  think  it  so.  When  once 
the  facts  are  given  which  determine  our  knowledge,  what  is 
probable  or  improbable  in  these  circumstances  has  been  fixed 
objectively,  and  is  independent  of  our  opinion.  The  Theory  of 
Probability  is  logical,  therefore,  because  it  is  concerned  with  the 
degree  of  belief  which  it  is  rational  to  entertain  in  given  conditions, 
and  not  merely  with  the  actual  beliefs  of  particular  individuals, 
which  may  or  may  not  be  rational. 

Given  the  body  of  direct  knowledge  which  constitutes  our 
ultimate  premisses,  this  theory  tells  us  what  further  rational 
beliefs,  certain  or  probable,  can  be  derived  by  valid  argument 
from  our  direct  knowledge.  This  involves  purely  logical  rela 
tions  between  the  propositions  which  embody  our  direct  know 
ledge  and  the  propositions  about  which  we  seek  indirect  know 
ledge.  What  particular  propositions  we  select  as  the  premisses 
of  our  argument  naturally  depends  on  subjective  factors  peculiar 
to  ourselves  ;  but  the  relations,  in  which  other  propositions  stand 
to  these,  and  which  entitle  us  to  probable  beliefs,  are  objective 
and  logical. 

3.  Let  our  premisses  consist  of  any  set  of  propositions  h,  and 
our  conclusion  consist  of  any  set  of  propositions  a,  then,  if  a 
knowledge  of  h  justifies  a  rational  belief  in  a  of  degree  a,  we  say 
that  there  is  a  probability-relation  of  degree  a  between  a  and  h.1 

In  ordinary  speech  we  often  describe  the  conclusion  as  being 
doubtful,  uncertain,  or  only  probable.  But,  strictly,  these  terms 
ought  to  be  applied,  either  to  the  degree  of  our  rational  belief  in 
the  conclusion,  or  to  the  relation  or  argument  between  two  sets 
of  propositions,  knowledge  of  which  would  afford  grounds  for  a 
corresponding  degree  of  rational  belief.2 

1  This  will  be  written  a/h  —a.  2  See  also  Chapter  II.  §  5. 
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4.  With  the  term  "  event,"  which  has  taken  hitherto  so  im 
portant  a  place  in  the  phraseology  of  the  subject,  I  shall  dis 
pense  altogether.1     Writers  on  Probability  have  generally  dealt 
with  what  they  term  the  "  happening  "  of  "  events."     In  the 
problems  which   they  first  studied  this  did  not  involve  much 
departure  from  common  usage.     But  these  expressions  are  now 
used  in  a  way  which  is  vague  and  unambiguous  ;   and  it  will  be 
more  than  a  verbal  improvement  to  discuss  the  truth  and  the 
probability  of  propositions  instead  of  the  occurrence  and  the 
probability  of  events.2 

5.  These   general    ideas    are    not    likely    to    provoke    much 
criticism.     In  the   ordinary  course  of  thought  and  argument, 
we  are  constantly  assuming  that  knowledge  of  one  statement, 
while   not  proving  the  truth  of   a   second,   yields   nevertheless 
some  ground  for  believing  it.     We  assert  that  we  ought  on  the 
evidence  to  prefer  such  and  such  a  belief.     We  claim  rational 
grounds  for  assertions  which  are  not  conclusively  demonstrated. 
We  allow,  in  fact,  that  statements  may  be  unproved,  without,  for 
that  reason,  being  unfounded.    And  it  does  not  seem  on  reflection 
that  the  information  we  convey  by  these  expressions  is  wholly 
subjective.     When  we  argue  that  Darwin  gives  valid  grounds 
for  our  accepting  his  theory  of  natural  selection,  we  do  not  simply 
mean  that  we  are  psychologically  inclined  to  agree  with  him  ; 
it  is  certain  that  we   also   intend   to   convey    our    belief   that 
we    are    acting    rationally   in    regarding    his    theory    as    prob 
able.      We   believe  that   there    is  some   real  objective  relation 
between  Darwin's  evidence  and  his  conclusions,  which  is  inde 
pendent  of  the  mere  fact  of  our  belief,  and  which  is  just  as  real 
and  objective,  though  of  a  different  degree,  as  that  which  would 
exist  if  the  argument  were  as  demonstrative  as  a  syllogism. 
We  are  claiming,  in  fact,  to  cognise  correctly  a  logical  connection 
between  one  set  of  propositions  which  we  call  our  evidence  and 
which  we  suppose  ourselves  to  know,  and  another  set  which  we 
call  our  conclusions,  and  to  which  we  attach  more  or  less  weight 

1  Except  in  those  chapters  (Chap.  XVII.,  for  example)  where  1  am  dealing 
chiefly  with  the  work  of  others. 

*  The  lirst  writer  I  know  of  to  notice  this  was  Ancillon  in  J)outes  sur  lex 
banex  du  calcul  des  probability  (17(.M)  :  "  Dire  qif  un  fait  passe,  present  <>u  a 
venir  est  probable,  c'est  dire  qu'une  proposition  est  probable."  The  point  was 
emphasised  by  Boole,  JMWS  of  Thought,  pp.  7  and  107.  See  also  Czubor, 
WahrtcheinlichkeitArechnung,  vol.  i.  p.  5,  and  Stumpf,  Cber  den  lie yriff  dcr  tnuthe- 
matitchen  \\  (ihrschcinlichke.it. 
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according  to  the  grounds  supplied  by  the  first.  It  is  this  type 
of  objective  relation  between  sets  of  propositions — the  type 
which  we  claim  to  be  correctly  perceiving  when  we  make  such 
assertions  as  these — to  which  the  reader's  attention  must  be 
directed. 

6.  It  is  not  straining  the  use  of  words  to  speak  of  this  as  the 
relation  of  probability.     It  is  true  that  mathematicians  have 
employed  the  term  in  a  narrower  sense  ;  for  they  have  often 
confined  it  to  the  limited  class  of  instances  in  which  the  relation 
is  adapted  to  an  algebraical  treatment.     But  in  common  usage 
the  word  has  never  received  this  limitation. 

Students  of  probability  in  the  sense  which  is  meant  by  the 
authors  of  typical  treatises  on  Wahrscheinlichkeitsrechnung  or 
Cakul  des  probabilites,  will  find  that  I  do  eventually  reach  topics 
with  which  they  are  familiar.  But  in  making  a  serious  attempt 
to  deal  with  the  fundamental  difficulties  with  which  all  students 
of  mathematical  probabilities  have  met  and  which  are  notoriously 
unsolved,  we  must  begin  at  the  beginning  (or  almost  at  the 
beginning)  and  treat  our  subject  widely.  As  soon  as  mathe 
matical  probability  ceases  to  be  the  merest  algebra  or  pretends 
to  guide  our  decisions,  it  immediately  meets  with  problems 
against  which  its  own  weapons  are  quite  powerless.  And  even 
if  we  wish  later  on  to  use  probability  in  a  narrow  sense,  it  will 
be  well  to  know  first  what  it  means  in  the  widest. 

7.  Between  two  sets  of  propositions,  therefore,  there  exists 
a  relation,  in  virtue  of  which,  if  we  know  the  first,  we  can  attach 
to  the  latter  some  degree  of  rational  belief.     This  relation  is  the 
subject-matter  of  the  logic  of  probability. 

A  great  deal  of  confusion  and  error  has  arisen  out  of  a 
failure  to  take  due  account  of  this  relational  aspect  of  prob 
ability.  From  the  premisses  "  a  implies  6  "  and  "  a  is  true,"  we 
can  conclude  something  about  b —namely  that  b  is  true — which 
does  not  involve  a.  But,  if  a  is  so  related  to  6,  that  a  knowledge 
of  it  renders  a  probable  belief  in  b  rational,  we  cannot  conclude 
anything  whatever  about  b  which  has  not  reference  to  a  ;  and  it 
is  not  true  that  every  set  of  self-consistent  premisses  which 
includes  a  has  this  same  relation  to  b.  It  is  as  useless,  there 
fore,  to  say  "  b  is  probable  "  as  it  would  be  to  say  "  b  is  equal," 
or  "  b  is  greater  than,"  and  as  unwarranted  to  conclude  that, 
because  a  makes  b  probable,  therefore  a  and  c  together  make  b 
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probable,  as  to  argue  that  because  a  is  less  than  6,  therefore  a 
and  c  together  are  less  than  b. 

Thus,  when  in  ordinary  speech  we  name  some  opinion  as 
probable  without  further  qualification,  the  phrase  is  generally 
elliptical.  We  mean  that  it  is  probable  when  certain  considera 
tions,  implicitly  or  explicitly  present  to  our  minds  at  the  moment, 
are  taken  into  account.  We  use  the  word  for  the  sake  of  short 
ness,  just  as  we  speak  of  a  place  as  being  three  miles  distant, 
when  we  mean  three  miles  distant  from  where  we  are  then  situated, 
or  from  some  starting-point  to  which  we  tacitly  refer.  No 
proposition  is  in  itself  either  probable  or  improbable,  just  as  no 
place  can  be  intrinsically  distant ;  and  the  probability  of  the 
same  statement  varies  with  the  evidence  presented,  which  is, 
as  it  were,  its  origin  of  reference.  We  may  fix  our  attention 
on  our  own  knowledge  and,  treating  this  as  our  origin,  consider 
the  probabilities  of  all  other  suppositions, — according  to  the 
usual  practice  which  leads  to  the  elliptical  form  of  common 
speech  :  or  we  may,  equally  well,  fix  it  on  a  proposed  conclusion 
and  consider  what  degree  of  probability  this  would  derive  from 
various  sets  of  assumptions,  which  might  constitute  the  corjms  of 
knowledge  of  ourselves  or  others,  or  which  are  merely 
hypotheses. 

Reflection  will  show  that  this  account  harmonises  with 
familiar  experience.  There  is  nothing  novel  in  the  supposition 
that  the  probability  of  a  theory  turns  upon  the  evidence  by  which 
it  is  supported  :  and  it  is  common  to  assert  that  an  opinion  was 
probable  on  the  evidence  at  first  to  hand,  but  on  further  informa- 
t  ion  was  untenable.  As  our  knowledge  or  our  hypothesis  changes, 
our  conclusions  have  new  probabilities,  not  in  themselves,  but 
relatively  to  these  new  premisses.  New  logical  relations  have 
now  become,  important,  namely  those,  between  the  conclusions 
which  we.  are  investigating  and  our  new  assumptions;  but  the 
old  relations  between  the  conclusions  and  the  former  assumptions 
still  exist  and  arc  just  as  real  as  these  now  ones.  It  would  be 
as  absurd  to  deny  that  an  opinion  MV/.V  probable,  when  at  a  later 
stage  certain  objections  have,  come  to  light,  as  to  deny,  when 
we  have,  reached  our  destination,  that  it  was  ever  three  miles 
distant  ;  and  the  opinion  still  -is  probable  in  relation  to  the  old 
hypotheses,  just  as  the  destination  is  still  three  miles  distant 
from  our  starting-point. 
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8.  A  definition  of  probability  is  not  possible,  unless  it  contents 
us  to  define  degrees  of  the  probability-relation  by  reference  to 
degrees  of  rational  belief.     We  cannot  analyse  the  probability- 
relation  in  terms  of  simpler  ideas.     As  soon  as  we  have  passed 
from  the  logic  of  implication  and  the  categories  of  truth  and 
falsehood  to  the  logic  of  probability  and  the  categories  of  know 
ledge,  ignorance,  and  rational  belief,  we  are  paying  attention  to 
a  new  logical  relation  in  which,  although  it  is  logical,  we  were 
not  previously  interested,  and  which  cannot  be  explained  or 
defined  in  terms  of  our  previous  notions. 

This  opinion  is,  from  the  nature  of  the  case,  incapable  of  posi 
tive  proof.  The  presumption  in  its  favour  must  arise  partly 
out  of  our  failure  to  find  a  definition,  and  partly  because  the 
notion  presents  itself  to  the  mind  as  something  new  and  inde 
pendent.  If  the  statement  that  an  opinion  was  probable  on  the 
evidence  at  first  to  hand,  but  became  untenable  on  further  in 
formation,  is  not  solely  concerned  with  psychological  belief,  1 
do  not  know  how  the  element  of  logical  doubt  is  to  be  defined, 
or  how  its  substance  is  to  be  stated,  in  terms  of  the  other 
indefinables  of  formal  logic.  The  attempts  at  definition,  which 
have  been  made  hitherto,  will  be  criticised  in  later  chapters. 
I  do  not  believe  that  any  of  them  accurately  represent  that  par 
ticular  logical  relation  which  we  have  in  our  minds  when  we 
speak  of  the  probability  of  an  argument. 

In  the  great  majority  of  cases  the  term  "  probable  "  seems  to 
be  used  consistently  by  different  persons  to  describe  the  same 
concept.  Differences  of  opinion  have  not  been  due,  I  think,  to 
a  radical  ambiguity  of  language.  In  any  case  a  desire  to  reduce 
the  indefinables  of  logic  can  easily  be  carried  too  far.  Even  if 
a  definition  is  discoverable  in  the  end,  there  is  no  harm  in  post 
poning  it  until  our  enquiry  into  the  object  of  definition  is  far 
advanced.  In  the  case  of  "  probability  "  the  object  before  the 
mind  is  so  familiar  that  the  danger  of  misdescribing  its  qualities 
through  lack  of  a  definition  is  less  than  if  it  were  a  highly  abstract 
entity  far  removed  from  the  normal  channels  of  thought. 

9.  This  chapter  has  served  briefly  to  indicate,  though  not 
to   define,    the    subject    matter   of    the   book.     Its   object   has 
been  to  emphasise  the  existence  of  a  logical  relation  between  two 
sets  of  propositions  in  cases  where  it  is  not  possible  to  argue 
demonstratively  from  one  to  the  other.     This  is  a  contention 
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of  a  most  fundamental  character.  It  is  not  entirely  novel,  but 
has  seldom  received  due  emphasis,  is  often  overlooked,  and 
sometimes  denied.  The  view,  that  probability  arises  out  of 
the  existence  of  a  specific  relation  between  premiss  and  conclusion, 
depends  for  its  acceptance  upon  a  reflective  judgment  on  the 
true  character  of  the  concept.  It  will  be  our  object  to  discuss, 
under  the  title  of  Probability,  the  principal  properties  of  this 
relation.  First,  however,  we  must  digress  in  order  to  consider 
briefly  what  we  mean  by  knowledge,  rational  belief,  and  argument. 


CHAPTER    II 

PROBABILITY   IN   RELATION   TO    THE    THEORY    OF   KNOWLEDGE 

1.  I  DO  not  wish  to  become  involved  in  questions  of  epistemology 
to  which  I  do  not  know  the  answer  ;  and  I  am  anxious  to  reach 
as  soon  as  possible  the  particular  part  of  philosophy  or  logic 
which  is  the  subject  of  this  book.  But  some  explanation  is 
necessary  if  the  reader  is  to  be  put  in  a  position  to  understand 
the  point  of  view  from  which  the  author  sets  out ;  I  will,  there 
fore,  expand  some  part  of  what  has  been  outlined  or  assumed 
in  the  first  chapter. 

2.  There  is,  first  of  all,  the  distinction  between  that  part  of 
our  belief  which  is  rational  and  that  part  which  is  not.     If  a 
man  believes  something  for  a  reason  which  is  preposterous  or 
for  no  reason  at  all,  and  what  he  believes  turns  out  to  be  true  for 
some  reason  not  known  to  him,  he  cannot  be  said  to  believe  it 
rationally,  although  he  believes  it  and  it  is  in  fact  true.     On  the 
other  hand,  a  man  may  rationally  believe  a  proposition  to  be 
probable,    when    it   is   in   fact   false.     The   distinction    between 
rational  belief  and  mere  belief,  therefore,  is  not  the  same  as  the 
distinction  between  true  beliefs  and  false  beliefs.     The  highest 
degree  of  rational  belief,  which  is  termed  certain  rational  belief, 
corresponds  to  knowledge.     We  may  be  said  to  know  a  thing 
when  we  have  a  certain  rational  belief  in  it,  and  vice  rersa.     For 
reasons  which  will  appear  from  our  account  of  probable  degrees 
of  rational  belief  in  the  following  paragraph,  it  is  preferable  to 
regard  knowledge  as  fundamental  and  to  define  rational  belief  by 
reference  to  it. 

3.  We  come  next  to  the  distinction  between  that  part  of  our 
rational  belief  which   is  certain  and   that  part  which   is   only 
probable.     Belief,  whether  rational  or  not,  is  capable  of  degree. 
The  highest  degree  of  rational  belief,   or  rational  certainty  of 

10 
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belief,  and  its  relation  to  knowledge  have  been  introduced  above. 
What,  however,  is  the  relation  to  knowledge  of  probable  degrees 
of  rational  belief  ? 

The  proposition  (say,  </)  that  wre  know  in  this  case  is  not  the 
same  as  the  proposition  (say,  p)  in  which  we  have  a  probable 
degree  (say,  a)  of  rational  belief.  If  the  evidence  upon  which 
we  base  our  belief  is  h,  then  what  we  know,  namely  q,  is  that 
the  proposition  p  bears  the  probability-relation  of  degree  a  to 
the  set  of  propositions  h  ;  and  this  knowledge  of  ours  justifies 
us  in  a  rational  belief  of  degree  a  in  the  proposition  p.  It  will 
be  convenient  to  call  propositions  such  as  p,  which  do  not  contain 
assertions  about  probability-relations,  "•primary  propositions  "  ; 
and  propositions  such  as  </,  which  assert  the  existence  of  a 
probability-relation,  "  secondary  propositions." 

4.  Thus  knowledge  of  a  proposition  always  corresponds  to 
certainty  of  rational  belief  in  it  and  at  the  same  time  to  actual 
truth  in  the  proposition  itself.     We  cannot  know  a  proposition 
unless  it  is  in  fact  true.     A  probable  degree  of  rational  belief 
in  a  proposition,  on  the  other  hand,  arises  out  of  knowledge  of 
some  corresponding  secondary  proposition.     A  man  may  ration- 
all}'  believe  a  proposition  to  be  probable  when  it  is  in  fact  false, 
if  the  secondary  proposition  on  which  he  depends  is  true  and 
certain  ;    while  a  man  cannot  rationally  believe  a  proposition 
to  be  probable  even  when  it  is  in  fact  true,  if  the  secondary 
proposition   on   which   he  depends  is  not  true.     Thus   rational 
belief    of   whatever  degree    can   only  arise    out    of    knowledge, 
although   the  knowledge  may  be  of  a  proposition  secondary,  in 
the  above  sense,  to  the  proposition  in  which  the  rational  degree 
of  belief  is  entertained. 

5.  At   this  point  it  is  desirable  to  colligate  the  three  senses 
in  which  the  term  probability  has  been  so  far  employed.     In  its 
most  fundamental  sense,  I  think,  it  refers  to  the  logical  relation 
between  two  sets  of  propositions,  which  in  S^of  Chapter  1.   1 
have  termed  the  probability-relation.      It  is  with  this  that  I  shall 
l>e  mainly  concerned  in  the  greater  part  of  this  Treatise.     Deriva 
tive  from  this  sense,  we  have  the  sense  in  which,  as  above,  the 
term  probable  is  applied  to  the  degrees  of  rational  belief  arising 
out  of  knowledge  of  secondary  propositions  which  assert  the 

1  This  classification  of  "primary"  and  "secondary"  propositions  was 
HUKi^sted  to  me  by  Mr.  W.  K.  Johnson. 
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existence  of  probability-relations  in  the  fundamental  logical  sense. 
Further  it  is  often  convenient,  and  not  necessarily  misleading, 
to  apply  the  term  probable  to  the  proposition  which  is  the  object 
of  the  probable  degree  of  rational  belief,  and  which  bears  the 
probability-relation  in  question  to  the  propositions  comprising 
the  evidence. 

6.  I  turn  now  to  the  distinction  between  direct  and  indirect 
knowledge — between  that  part  of  our  rational  belief  which  we 
know  directly  and  that  part  which  we  know  by  argument. 

We  start  from  things,  of  variou'  classes,  with  which  we  have, 
what  I  choose  to  call  without  reference  to  other  uses  of  this  term, 
direct  acquaintance.  Acquaintance  with  such  things  does  not  in 
itself  constitute  knowledge,  although  knowledge  arises  out  of 
acquaintance  with  them.  The  most  important  classes  of  things 
with  which  we  have  direct  acquaintance  are  our  own  sensations, 
which  we  may  be  said  to  experience,  the  ideas  or  meanings,  about 
which  we  have  thoughts  and  which  we  may  be  said  to  understand, 
and  facts  or  characteristics  or  relations  of  sense-data  or  meanings, 
which  we  may  be  said  to  perceive  ; — experience,  understanding, 
and  perception  being  three  forms  of  direct  acquaintance. 

The  objects  of  knowledge  and  belief — as  opposed  to  the 
objects  of  direct  acquaintance  which  I  term  sensations,  meanings, 
and  perceptions — I  shall  term  propositions. 

Now  our  knowledge  of  propositions  seems  to  be  obtained  in 
two  ways  :  directly,  as  the  result  of  contemplating  the  objects 
of  acquaintance  ;  and  indirectly,  by  argument,  through  perceiving 
the  probability-relation  of  the  proposition,  about  which  we  seek 
knowledge,  to  other  propositions.  In  the  second  case,  at  any 
rate  at  first,  what  we  know  is  not  the  proposition  itself  but  a 
secondary  proposition  involving  it.  When  we  know  a  secondary 
proposition  involving  the  proposition  p  as  subject,  we  may  be 
said  to  have  indirect  knowledge  about  p. 

Indirect  knowledge  about  p  may  in  suitable  conditions  lead 
to  rational  belief  in  p  of  an  appropriate  degree.  If  this  degree 
is  that  of  certainty,  then  we  have  not  merely  indirect  knowledge 
about  p,  but  indirect  knowledge  of  p. 

7.  Let   us   take   examples   of   direct   knowledge.     From   ac 
quaintance  with  a  sensation  of  yellow  I  can  pass  directly  to  a 
knowledge  of  the  proposition  "  I  have  a  sensation  of  yellow." 
From  acquaintance  with  a  sensation  of  yellow  and  with  the 
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meanings  of  "  yellow,"  "  colour,"  "  existence,"  I  may  he  able 
to  pass  to  a  direct  knowledge  of  the  propositions  "  I  understand 
the  meaning  of  yellow,"  "  my  sensation  of  yellow  exists,"  "  yellow 
is  a  colour."  Thus,  by  some  mental  process  of  which  it  is 
diilicult  to  give  an  account,  we  are  able  to  pass  from  direct 
acquaintance  with  things  to  a  knowledge  of  propositions  about 
the  things  of  which  we  have  sensations  or  understand  the 
meaning. 

Next,  by  the  contemplation  of  propositions  of  which  we  have 
direct  knowledge,  we  are  able  to  pass  indirectly  to  knowledge  of  or 
about  other  propositions.  The  mental  process  by  which  we  pass 
from  direct  knowledge  to  indirect  knowledge  is  in  some  cases  and 
in  some  degree  capable  of  analysis.  We  pass  from  a  knowledge 
of  the  proposition  a  to  a  knowledge  about  the  proposition  6  by  per 
ceiving  a  logical  relation  between  them.  With  this  logical  rela 
tion  we  have  direct  acquaintance.  The  logic  of  knowledge  is 
mainly  occupied  with  a  study  of  the  logical  relations,  direct 
acquaintance  with  which  permits  direct  knowledge  of  the 
secondary  proposition  asserting  the  probability-relation,  and  so 
to  indirect  knowledge  about,  and  in  some  cases  of,  the  primary 
proposition. 

It  is  not  always  possible,  however,  to  analyse  the  mental 
process  in  the  case  of  indirect  knowledge,  or  to  say  by  the  per 
ception  of  what  logical  relation  we  have  passed  from  the  know 
ledge  of  one  proposition  to  knowledge  about  another.  But 
although  in  some  cases  we  seem  to  pass  directly  from  one  pro 
position  to  another,  I  am  inclined  to  believe  that  in  all  legitimate 
transitions  of  this  kind  some  logical  relation  of  the  proper  kind 
must  exist  between  the  propositions,  even  when  we  are  not 
explicitly  aware  of  it.  In  any  case,  whenever  we  pass  to 
knowledge  about  one  proposition  by  the  contemplation  of  it  in 
relation  to  another  proposition  of  which  we  have  knowledge — 
even  when  the  process  is  unanalysable  I  call  it  an  argument. 
The  knowledge,  such  as  we  have  in  ordinary  thought  by  passing 
from  one  proposition  to  another  without  being  able  to  say  what 
logical  relations,  if  any,  we  have  perceived  between  them,  may 
be  termed  uncompleted  knowledge.  And  knowledge,  which 
results  from  a  distinct  apprehension  of  the  relevant  logical 
relations,  may  be  termed  knowledge  proper. 

8.  In   this  way,  therefore,  I  distinguish  between  direct  and 
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indirect  knowledge,  between  that  part  of  our  rational  belief  which 
is  based  on  direct  knowledge  and  that  part  which  is  based  on 
argument.  About  what  kinds  of  things  we  are  capable  of  know 
ing  propositions  directly,  it  is  not  easy  to  say.  About  our 
own  existence,  our  own  sense-data,  some  logical  ideas,  and  some 
logical  relations,  it  is  usually  agreed  that  we  have  direct  know 
ledge.  Of  the  law  of  gravity,  of  the  appearance  of  the  other 
side  of  the  moon,  of  the  cure  for  phthisis,  of  the  contents  of 
Bradshaw,  it  is  usually  agreed  that  we  do  not  have  direct  know 
ledge.  But  many  questions  are  in  doubt.  Of  which  logical 
ideas  and  relations  we  have  direct  acquaintance,  as  to  whether 
we  can  ever  know  directly  the  existence  of  other  people,  and  as 
to  when  we  are  knowing  propositions  about  sense-data  directly 
and  when  we  are  interpreting  them — it  is  not  possible  to  give 
a  clear  answer.  Moreover,  there  is  another  and  peculiar  kind 
of  derivative  knowledge — by  memory. 

At  a  given  moment  there  is  a  great  deal  of  our  knowledge 
which  we  know  neither  directly  nor  by  argument — we  remember 
it.  We  may  remember  it  as  knowledge,  but  forget  how  we  origin 
ally  knew  it.  What  we  once  knew  and  now  consciously  re 
member,  can  fairly  be  called  knowledge.  But  it  is  not  easy  to 
draw  the  line  between  conscious  memory,  unconscious  memory 
or  habit,  and  pure  instinct  or  irrational  associations  of  ideas 
(acquired  or  inherited) — the  last  of  which  cannot  fairly  be  called 
knowledge,  for  unlike  the  first  two  it  did  not  even  arise  (in  us  at 
least)  out  of  knowledge.  Especially  in  such  a  case  as  that  of 
what  our  eyes  tell  us,  it  is  difficult  to  distinguish  between  the 
different  wTays  in  which  our  beliefs  have  arisen.  We  cannot 
always  tell,  therefore,  what  is  remembered  knowledge  and  what  is 
not  knowledge  at  all ;  and  when  knowledge  is  remembered,  we 
do  not  always  remember  at  the  same  time  whether,  originally,  it 
was  direct  or  indirect. 

Although  it  is  with  knowledge  by  argument  that  1  shall  be 
mainly  concerned  in  this  book  there  is  one  kind  of  direct  know 
ledge,  namely  of  secondary  propositions,  with  which  I  cannot 
help  but  be  involved.  In  the  case  of  every  argument,  it  is  only 
directly  that  we  can  know  the  secondary  proposition  which  makes 
the  argument  itself  valid  and  rational.  When  we  know  some 
thing  by  argument  this  must  be  through  direct  acquaintance 
with  some  logical  relation  between  the  conclusion  and  the  premiss. 
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In  all  knowledge,  therefore,  there  is  some  direct  element ;  and 
logic  can  never  be  made  purely  mechanical.  All  it  can  do  is 
so  to  arrange  the  reasoning  that  the  logical  relations,  which 
have  to  be  perceived  directly,  are  made  explicit  and  are  of  a 
simple  kind. 

9.  It  must  be  added  that  the  term  certainty  is  sometimes  used 
in  a  merely  psychological  sense  to  describe  a  state  of  mind 
without  reference  to  the  logical  grounds  of  the  belief.  With 
this  sense  I  am  not  concerned.  It  is  also  used  to  describe  the 
highest  degree  of  rational  belief ;  and  this  is  the  sense  relevant 
to  our  present  purpose.  The  peculiarity  of  certainty  is  that 
knowledge  of  a  secondary  proposition  involving  certainty, 
together  with  knowledge  of  what  stands  in  this  secondary 
proposition  in  the  position  of  evidence,  leads  to  knowledge  of, 
and  not  merely  about,  the  corresponding  primary  proposition. 
Knowledge,  on  the  other  hand,  of  a  secondary  proposition  in 
volving  a  degree  of  probability  lower  than  certainty,  together 
with  knowledge  of  the  premiss  of  the  secondary  proposition, 
leads  only  to  a  rational  belief  of  the  appropriate  degree  in  the 
primary  proposition.  The  knowledge  present  in  this  latter  case 
I  have  called  knowledge  about  the  primary  proposition  or  con 
clusion  of  the  argument,  as  distinct  from  knowledge  of  it. 

Of  probability  we  can  say  no  more  than  that  it  is  a  lower  degree 
of  rational  belief  than  certainty  ;  and  we  may  say,  if  we  like, 
that  it  deals  with  degrees  of  certainty.1  Or  we  may  make 
probability  the  more  fundamental  of  the  two  and  regard  certainty 
as  a  special  case  of  probability,  as  being,  in  fact,  the  maximum 
probability.  Speaking  somewhat  loosely  we  may  say  that,  if 
our  premisses  make  the  conclusion  certain,  then  it  folloivs  from 
the  premisses  ;  and  if  they  make  it  very  probable,  then  it  very 
nearly  follows  from  them. 

It  is  sometimes  useful  to  use  the  term  "  impossibility  ''  as 
the  negative  correlative  of  "  certainty,"  although  the  former 
sometimes  has  a  different  set  of  associations.  If  a  is  certain, 
then  the  contradictory  of  a  is  impossible.  If  a  knowledge  of  a 
makes  b  certain,  then  a  knowledge  of  a  makes  the  contradictory 

1  This  view  has  often  been  taken,  e.g.,  by  Bernoulli  and,  incidentally,  by 
Laplace;  also  by  Fries  (sec  Czuber,  Entwicklung,  p.  12).  The  view,  occasion 
ally  held,  that  probability  is  concerned  with  decrees  of  truth,  arises  out  of  a 
confusion  between  certainty  and  truth.  Perhaps  the  Aristotelian  doctrine 
that  future  events  arc  neither  true  nor  false  arose  in  this  way. 
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of  b  impossible.  Thus  a  proposition  is  impossible  with  respect 
to  a  given  premiss,  if  it  is  disproved  by  the  premiss  ;  and  the 
relation  of  impossibility  is  the  relation  of  minimum  probability.1 

10.  We  have  distinguished  between  rational  belief  and  irrational 
belief  and  also  between  rational  beliefs  which  are  certain  in  degree 
and  those  which  are  only  probable.  Knowledge  has  been 
distinguished  according  as  it  is  direct  or  indirect,  according  as  it 
is  of  primary  or  secondary  propositions,  and  according  as  it  is 
of  or  merely  about  its  object. 

In  order  that  we  may  have  a  rational  belief  in  a  proposition  p 
of  the  degree  of  certainty,  it  is  necessary  that  one  of  two  con 
ditions  should  be  fulfilled— (i.)  that  we  know  p  directly  ;  or  (ii.) 
that  we  know  a  set  of  propositions  h,  and  also  know  some  secondary 
proposition  q  asserting  a  certainty-relation  between  p  and  h. 
In  the  latter  case  h  may  include  secondary  as  well  as  primary 
propositions,  but  it  is  a  necessary  condition  that  all  the  pro 
positions  h  should  be  known.  In  order  that  we  may  have  rational 
belief  in  p  of  a  lower  degree  of  probability  than  certainty,  it  is 
necessary  that  we  know  a  set  of  propositions  h,  and  also  know 
some  secondary  proposition  q  asserting  a  probability-relation 
between  p  and  h. 

In  the  above  account  one  possibility  has  been  ruled  out.  It 
is  assumed  that  we  cannot  have  a  rational  belief  in  p  of  a  degree 
less  than  certainty  except  through  knowing  a  secondary  pro 
position  of  the  prescribed  type.  Such  belief  can  only  arise,  that 
is  to  say,  by  means  of  the  perception  of  some  probability-relation. 
To  employ  a  common  use  of  terms  (though  one  inconsistent  with 
the  use  adopted  above),  I  have  assumed  that  all  direct  knowledge 
is  certain.  All  knowledge,  that  is  to  say,  which  is  obtained  in  a 
manner  strictly  direct  by  contemplation  of  the  objects  of  acquaint 
ance  and  without  any  admixture  whatever  of  argument  and  the 
contemplation  of  the  logical  bearing  of  any  other  knowledge  on 
this,  corresponds  to  certain  rational  belief  and  not  to  a  merely 
probable  degree  of  rational  belief.  It  is  true  that  there  do  seem 
to  be  degrees  of  knowledge  and  rational  belief,  when  the  source  of 

1  Necessity  and  Impossibility,  in  the  senses  in  which  these  terms  are  used 
in  the  theory  of  Modality,  seem'to  correspond  to  the  relations  of  Certainty  and 
Impossibility  in  the  theory  of  probability,  the  other  modals,  which  comprise 
the  intermediate  degrees  of  possibility,  corresponding  to  the  intermediate 
degrees  of  probability.  Almost  up  to  the  end  of  the  seventeenth  century 
the  traditional  treatment  of  modals  is,  in  fact,  a  primitive  attempt  to  bring 
the  relations  of  probability  within  the  scope  of  formal  logic. 
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the  belief  is  solely  in  acquaintance,  as  there  are  when  its  source 
is  in  argument.  But  I  think  that  this  appearance  arises  partly 
out  of  the  difficulty  of  distinguishing  direct  from  indirect  know 
ledge,  and  partly  out  of  a  confusion  between  probable  know 
ledge  and  vayue  knowledge.  I  cannot  attempt  here  to  analyse 
the  meaning  of  vague  knowledge.  It  is  certainly  not  the  same 
thing  as  knowledge  proper,  whether  certain  or  probable,  and 
it  does  not  seem  likely  that  it  is  susceptible  of  strict  logical 
treatment.  At  any  rate  I  do  not  know  how  to  deal  with  it, 
and  in  spite  of  its  importance  I  will  not  complicate  a  difficult 
subject  by  endeavouring  to  treat  adequately  the  theory  of  vague 
knowledge. 

I  assume  then  that  only  true  propositions  can  be  known, 
that  the  term  "  probable  knowledge  "  ought  to  be  replaced  by 
the  term  "  probable  degree  of  rational  belief,"  and  that  a  probable 
degree  of  rational  belief  cannot  arise  directly  but  only  as  the 
result  of  an  argument,  out  of  the  knowledge,  that  is  to  say,  of 
a  secondary  proposition  asserting  some  logical  probability- 
relation  in  which  the  object  of  the  belief  stands  to  some  known 
proposition.  With  arguments,  if  they  exist,  the  ultimate  pre 
misses  of  which  are  known  in  some  other  manner  than  that 
described  above,  such  as  might  be  called  "  probable  knowledge," 
my  theory  is  not  adequate  to  deal  without  modification.1 

For  the  objects  of  certain  belief  which  is  based  on  direct 
knowledge,  as  opposed  to  certain  belief  arising  indirectly,  there 
is  a  well-established  expression;  propositions,  in  which  our 
rational  belief  is  both  certain  and  direct,  are  said  to  be 
self -evident. 

11.  lu  conclusion,  the  relativity  of  knowledge  to  the  individual 
may  be  briefly  touched  on.  Some  part  of  knowledge— knowledge 
of  our  own  existence  or  of  our  own  sensations — is  clearly  rela 
tive  to  individual  experience.  \Ve  cannot  speak  of  knowledge 
absolutely — only  of  the  knowledge  of  a  particular  person.  Other 
parts  of  knowledge — knowledge  of  the  axioms  of  logic,  for  ex 
ample — may  seem  more  objective.  But  we  must  admit,  I  think, 
that  this  too  is  relative  to  the  constitution  of  the  human  mind, 
and  that  the  constitution  of  the  human  mind  may  vary  in  some 
degree  from  man  to  man.  What  is  self-evident  to  me  and  what 

1  I  do  not  mean  to  imply,  however,  at  any  rate  at  present,  that  the  ultimate 
premisses  of  an  argument  need  always  be  primary  pro  positions. 

C 
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I  really  know,  may  be  only  a  probable  belief  to  you,  or  may  form 
no  part  of  your  rational  beliefs  at  all.  And  this  may  be  true 
not  only  of  such  things  as  my  existence,  but  of  some  logical  axioms 
also.  Some  men — indeed  it  is  obviously  the  case — may  have  a 
greater  power  of  logical  intuition  than  others.  Further,  the 
difference  between  some  kinds  of  propositions  over  which  human 
intuition  seems  to  have  power,  and  some  over  which  it  has  none, 
may  depend  wholly  upon  the  constitution  of  our  minds  and 
have  no  significance  for  a  perfectly  objective  logic.  We  can  no 
more  assume  that  all  true  secondary  propositions  are  or  ought 
to  be  universally  known  than  that  all  true  primary  propositions 
are  known.  The  perceptions  of  some  relations  of  probability 
may  be  outside  the  powers  of  some  or  all  of  us. 

What  we  know  and  what  probability  we  can  attribute  to  our 
rational  beliefs  is,  therefore,  subjective  in  the  sense  of  being 
relative  to  the  individual.  But  given  the  body  of  premisses  which 
our  subjective  powers  and  circumstances  supply  to  us,  and  given 
the  kinds  of  logical  relations,  upon  which  arguments  can  be  based 
and  which  we  have  the  capacity  to  perceive,  the  conclusions, 
which  it  is  rational  for  us  to  draw,  stand  to  these  premisses  in  an 
objective  and  wholly  logical  relation.  Our  logic  is  concerned 
with  drawing  conclusions  by  a  series  of  steps  of  certain  specified 
kinds  from  a  limited  body  of  premisses. 

With  these  brief  indications  as  to  the  relation  of  Probability, 
as  I  understand  it,  to  the  Theory  of  Knowledge,  I  pass  from 
problems  of  ultimate  analysis  and  definition,  which  are  not  the 
primary  subject  matter  of  this  book,  to  the  logical  theory  and 
superstructure,  which  occupies  an  intermediate  position  between 
the  ultimate  problems  and  the  applications  of  the  theory,  whether 
such  applications  take  a  generalised  mathematical  form  or  a 
concrete  and  particular  one.  For  this  purpose  it  would  only 
encumber  the  exposition,  without  adding  to  its  clearness  or  its 
accuracy,  if  I  were  to  employ  the  perfectly  exact  terminology 
and  minute  refinements  of  language,  which  are  necessary  for  the 
avoidance  of  error  in  very  fundamental  enquiries.  While  taking 
pains,  therefore,  to  avoid  any  divergence  between  the  substance 
of  this  chapter  and  of  those  which  succeed  it,  and  to  employ  only 
such  periphrases  as  could  be  translated,  if  desired,  into  perfectly 
exact  language,  I  shall  not  cut  myself  off  from  the  convenient, 
but  looser,  expressions,  which  have  been  habitually  employed 
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by  previous  writers  and  have  the  advantage  of  being,  in  a  general 
way  at  least,  immediately  intelligible  to  the  reader.1 

1  This  question,  which  faces  all  contemporary  writers  on  logical  and  philo 
sophical  subjects,  is  in  my  opinion  much  more  a  question  of  style — and  therefore 
to  be  settled  on  the  same  sort  of  considerations  as  other  such  questions — than 
is  generally  supposed.  There  are  occasions  for  very  exact  methods  of  state 
ment,  such  as  are  employed  in  Mr.  Russell's  Principia  Mathematica.  But  there 
are  advantages  also  in  writing  the  English  of  Hume.  Mr.  Moore  has  developed 
in  Principia  Ethica  an  intermediate  style  which  in  his  hands  has  force  and 
beauty.  IJut  those  writers,  who  strain  after  exaggerated  precision  without 
going  the  whole  hog  with  Mr.  Russell,  are  sometimes  merely  pedantic.  They 
lose  the  reader's  attention,  and  the  repetitious  complication  of  their  phrases 
eludes  his  comprehension,  without  their  really  attaining,  to  compensate, 
a  complete  precision.  Confusion  of  thought  is  not  always  best  avoided  by 
technical  and  unaccustomed  expressions,  to  which  the  mind  has  no  immediate 
reaction  of  understanding  ;  it  is  possible,  under  cover  of  a  careful  formalism, 
to  make  statements,  which,  if  expressed  in  plain  language,  the  mind  would 
immediately  repudiate.  There  is  much  to  be  said,  therefore,  in  favour  of 
understanding  the  substance  of  what  you  are  saying  all  Ike  time,  and  of  never 
reducing  the  substantives  of  your  argument  to  the  mental  status  of  an  x  or  y. 


CHAPTER    III 

THE    MEASUREMENT    OF   PROBABILITIES 

1.  I  HAVE  spoken  of  probability  as  being  concerned  with  degrees 
of  rational  belief.  This  phrase  implies  that  it  is  in  some  sense 
quantitative  and  perhaps  capable  of  measurement.  The  theory 
of  probable  arguments  must  be  much  occupied,  therefore,  with 
comparisons  of  the  respective  weights  which  attach  to  different 
arguments.  With  this  question  we  will  now  concern  ourselves. 

It  has  been  assumed  hitherto  as  a  matter  of  course  that 
probability  is,  in  the  full  and  literal  sense  of  the  word,  measurable. 
I  shall  have  to  limit,  not  extend,  the  popular  doctrine.  But, 
keeping  my  own  theories  in  the  background  for  the  moment,  I 
will  begin  by  discussing  some  existing  opinions  on  the  subject. 

2.  It  has  been  sometimes  supposed  that  a  numerical  comparison 
between  the  degrees  of  any  pair  of  probabilities  is  not  only  con 
ceivable  but  is  actually  within  our  power.  Bentham,  for  instance, 
in  his  Rationale  of  Judicial  Evidence*  proposed  a  scale  on  which 
witnesses  might  mark  the  degree  of  their  certainty  ;  and  others 
have  suggested  seriously  a  '  barometer  of  probability.'  2 

That  such  comparison  is  theoretically  'possible,  whether  or  not 
we  are  actually  competent  in  every  case  to  make  the  comparison, 
has  been  the  generally  accepted  opinion.  The  following  quota 
tion  3  puts  this  point  of  view  very  well : 

"  I  do  not  see  on  what  ground  it  can  be  doubted  that  every 

1  Book  i.  chap  vi.  (referred  to  by  Venn). 

2  The  reader  may  be  reminded  of  Gibbon's  proposal  that :— "  A  Theological 
Barometer  might  be  formed,  of  which  the  Cardinal  (Baronius)  and  our  country 
man,  Dr.  Middlcton,  should  constitute  the  opposite  and  remote  extremities, 
as  the  former  sunk  to  the  lowest  degree  of  credulity,  which  was  compatible  with 
learning,  and  the  latter  rose  to  the  highest  pitch  of  scepticism,  in  any  wise 
consistent  with  Religion." 

a  W.  F.  Donkin,  Phil.  Mag.,  1851.  He  is  replying  to  an  article  by  J.  D. 
Forbes  (Phil  Mag.,  Aug.  1849)  which  had  cast  doubt  upon  this  opinion. 

20 
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definite  state  of  belief  concerning  a  proposed  hypothesis  is  in 
itself  capable  of  being  represented  by  a  numerical  expression, 
however  difficult  or  impracticable  it  may  be  to  ascertain  its 
actual  value.  It  would  be  very  difficult  to  estimate  in  numbers 
the  cis  viva  of  all  of  the  particles  of  a  human  body  at  any  instant ; 
but  no  one  doubts  that  it  is  capable  of  numerical  expression.  I 
mention  this  because  I  am  not  sure  that  Professor  Forbes  has 
distinguished  the  difficulty  of  ascertaining  numbers  in  certain 
cases  from  a  supposed  difficulty  of  expression  by  means  of  numbers. 
The  former  difficulty  is  real,  but  merely  relative  to  our  knowledge 
and  skill  ;  the  latter,  if  real,  would  be  absolute  and  inherent  in 
the  subject-matter,  which  I  conceive  is  not  the  case." 

De  Morgan  held  the  same  opinion  on  the  ground  that,  wherever 
we  have  differences  of  degree,  numerical  comparison  must  be 
theoretically  possible.1  He  assumes,  that  is  to  say,  that  all 
probabilities  can  be  placed  in  an  order  of  magnitude,  and  argues 
from  this  that  they  must  be  measurable.  Philosophers,  however, 
who  are  mathematicians,  would  no  longer  agree  that,  even  if  the 
premiss  is  sound,  the  conclusion  follows  from  it.  Objects  can 
be  arranged  in  an  order,  which  we  can  reasonably  call  one  of 
degree  or  magnitude,  without  its  being  possible  to  conceive  a 
system  of  measurement  of  the  differences  between  the  individuals. 

This  opinion  may  also  have  been  held  by  others,  if  not  by 
De  Morgan,  in  part  because  of  the  narrow  associations  which 
Probability  has  had  for  them.  The  Calculus  of  Probability  has 
received  far  more  attention  than  its  logic,  and  mathematicians, 
under  no  compulsion  to  deal  with  the  whole  of  the  subject,  have 
naturally  confined  their  attention  to  those  special  cases,  the  exist 
ence  of  which  will  be  demonstrated  at  a  later  stage,  where 
algebraical  representation  is  possible.  Probability  has  become 
associated,  therefore,  in  the  minds  of  theorists  with  those  problems 
in  which  we  are  presented  with  a  number  of  exclusive  and  ex 
haustive  alternatives  of  equal  probability ;  and  the  principles,  which 
are  readily  applicable  in  such  circumstances,  have  been  supposed, 
without  much  further  enquiry,  to  possess  general  validity. 

3.  It  is  also  the  case  that  theories  of  probability  have  been 

1  "  Whenever  the  terms  greater  and  less  ean  be  applied,  there  twice,  thrice, 
etc.,  can  be  conceived,  though  not  perhaps  measured  by  us." — "  Theory  of  Prob 
abilities,"  Encyclopaedia  Metropolitana,  p.  305.  He  is  a  little  more  guarded  in 
his  Fornml  Aoj/ir,  pp.  174,  175;  but  arrives  at  the  same  conclusion  so  far  as 
probability  is  concerned. 
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propounded  and  widely  accepted,  according  to  which  its  numerical 
character  is  necessarily  involved  in  the  definition.  It  is  often 
said,  for  instance,  that  probability  is  the  ratio  of  the  number  of 
"  favourable  cases  "  to  the  total  number  of  "  cases."  If  this 
definition  is  accurate,  it  follows  that  every  probability  can  be 
properly  represented  by  a  number  and  in  fact  is  a  number  ;  for 
a  ratio  is  not  a  quantity  at  all.  In  the  case  also  of  definitions 
based  upon  statistical  frequency,  there  must  be  by  definition  a 
numerical  ratio  corresponding  to  every  probability.  These 
definitions  and  the  theories  based  on  them  will  be  discussed  in 
Chapter  VIII.  ;  they  are  connected  with  fundamental  differences 
of  opinion  with  which  it  is  not  necessary  to  burden  the  present 
argument. 

4.  If  we  pass  from  the  opinions  of  theorists  to  the  experience 
of  practical  men,  it  might  perhaps  be  held  that  a  presumption 
in  favour  of  the  numerical  valuation  of  all  probabilities  can  be 
based  on  the  practice  of  underwriters  and  the  willingness  of 
Lloyd's  to  insure  against  practically  any  risk.  Underwriters  are 
actually  willing,  it  might  be  urged,  to  name  a  numerical  measure 
in  every  case,  and  to  back  their  opinion  with  money.  But  this 
practice  shows  no  more  than  that  many  probabilities  are  greater 
or  less  than  some  numerical  measure,  not  that  they  themselves 
are  numerically  definite.  It  is  sufficient  for  the  underwriter  if 
the  premium  he  names  exceeds  the  probable  risk.  But,  apart 
from  this,  I  doubt  whether  in  extreme  cases  the  process  of  thought, 
through  which  he  goes  before  naming  a  premium,  is  wholly 
rational  and  determinate  ;  or  that  two  equally  intelligent  brokers 
acting  on  the  same  evidence  would  always  arrive  at  the  same 
result.  In  the  case,  for  instance,  of  insurances  effected  before 
a  Budget,  the  figures  quoted  must  be  partly  arbitrary.  There  is 
in  them  an  element  of  caprice,  and  the  broker's  state  of  mind, 
when  he  quotes*  a  figure,  is  like  a  bookmaker's  when  he  names 
odds.  Whilst  he  may  be  able  to  make  sure  of  a  profit,  on  the 
principles  of  the  bookmaker,  yet  the  individual  figures  that  make 
up  the  book  are,  within  certain  limits,  arbitrary.  He  may  be 
almost  certain,  that  is  to  say,  that  there  will  not  be  new  taxes  on 
more  than  one  of  the  articles  tea,  sugar,  and  whisky  ;  there 
may  be  an  opinion  abroad,  reasonable  or  unreasonable,  that  the 
likelihood  is  in  the  order — whisky,  tea,  sugar  ;  and  he  may, 
therefore,  be  able  to  effect  insurances  for  equal  amounts  in  each 
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at  30  per  cent,  40  per  cent,  and  45  per  cent.  He  has  thus  made 
sure  of  a  profit  of  15  per  cent,  however  absurd  and  arbitrary  his 
quotations  may  be.  It  is  not  necessary  for  the  success  of  under 
writing  on  these  lines  that  the  probabilities  of  these  new  taxes 
are  really  measurable  by  the  figures  j:*0,  j40,and  }\*0;  it  is  sufficient 
that  there  should  be  merchants  willing  to  insure  at  these  rates. 
These  merchants,  moreover,  may  be  wise  to  insure  even  if  the 
quotations  are  partly  arbitrary  ;  for  they  may  run  the  risk  of  in 
solvency  unless  their  possible  loss  is  thus  limited.  That  the 
transaction  is  in  principle  one  of  bookmaking  is  shown  by  the 
fact  that,  if  there  is  a  specially  large  demand  for  insurance  against 
one  of  the  possibilities,  the  rate  rises  ;— the  probability  has  not 
changed,  but  the  "  book  "  is  in  danger  of  being  upset.  A  Presi 
dential  election  in  the  United  States  supplies  a  more  precise 
example.  On  August  23,  1912,  CO  per  cent  was  quoted  at  Lloyd's 
to  pay  a  total  loss  should  Dr.  Woodrow  Wilson  be  elected,  30  per 
cent  should  Mr.  Taft  be  elected,  and  20  per  cent  should  Mr. 
Roosevelt  be  elected.  A  broker,  who  could  eft'ect  insurances 
in  equal  amounts  against  the  election  of  each  candidate,  would  be 
certain  at  these  rates  of  a  profit  of  10  per  cent.  Subsequent 
modifications  of  these  terms  would  largely  depend  upon  the 
number  of  applicants  for  each  kind  of  policy.  Is  it  possible  to 
maintain  that  these  figures  in  any  way  represent  reasoned 
numerical  estimates  of  probability  ? 

In  some  insurances  the  arbitrary  element  seems  even  greater. 
Consider,  for  instance,  the  reinsurance  rates  for  the  Waratah, 
a  vessel  which  disappeared  in  South  African  waters.  The 
lapse  of  time  made  rates  rise  ;  the  departure  of  ships  in  search  of 
her  made  them  fall  ;  some  nameless  wreckage  is  found  and  they 
rise  ;  it  is  remembered  that  in  similar  circumstances  thirty 
years  ago  a  vessel  floated,  helpless  but  not  seriously  damaged, 
for  two  months,  and  they  fall.  Can  it  be  pretended  that  the 
figures  which  were  quoted  from  day  to  day— 75  per  cent,  83  per 
cent,  78  per  cent  -were  rationally  determinate,  or  that  the 
actual  figure  was  not  within  wide  limits  arbitrary  and  due  to 
the  caprice  of  individuals  ?  In  fact  underwriters  themselves 
distinguish  between  risks  which  are  properly  insurable,  either 
because  their  probability  can  be  estimated  between  comparatively 
narrow  numerical  limits  or  because  it  is  possible  to  make  a  "  book  " 
which  covers  all  possibilities,  and  other  risks  which  cannot  be 
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dealt  with  in  this  way  and  which  cannot  form  the  basis  of  a  regular 
business  of  insurance, — although  an  occasional  gamble  may  be 
indulged  in.  I  believe,  therefore,  that  the  practice  of  under 
writers  weakens  rather  than  supports  the  contention  that  all 
probabilities  can  be  measured  and  estimated  numerically. 

5.  Another  set  of  practical  men,  the  lawyers,  have  been  more 
subtle  in  this  matter  than  the  philosophers.1  A  distinction, 
interesting  for  our  present  purpose,  between  probabilities,  which 
can  be  estimated  within  somewhat  narrow  limits,  and  those  which 
cannot,  has  arisen  in  a  series  of  judicial  decisions  respecting 
damages.  The  following  extract  2  from  the  Times  Law  Reports 
seems  to  me  to  deal  very  clearly  in  a  mixture  of  popular  and  legal 
phraseology,  with  the  logical  point  at  issue  : 

This  was  an  action  brought  by  a  breeder  of  racehorses  to 
recover  damages  for  breach  of  a  contract.  The  contract  was 
that  Cyllene,  a  racehorse  owned  by  the  defendant,  should  in  the 
season  of  the  year  1909  serve  one  of  the  plaintiff's  brood 
mares.  In  the  summer  of  1908  the  defendant,  without  the  con 
sent  of  the  plaintiff,  sold  Cyllene  for  £30,000  to  go  to  South 
America.  The  plaintiff  claimed  a  sum  equal  to  the  average 
profit  he  had  made  through  having  a  mare  served  by  Cyllene 
during  the  past  four  years.  During  those  four  years  he  had 
had  four  colts  which  had  sold  at  £3300.  Upon  that  basis  his 
loss  came  to  700  guineas. 

Mr.  Justice  Jelf  said  that  he  was  desirous,  if  he  properly 
could,  to  find  some  mode  of  legally  making  the  defendant  com 
pensate  the  plaintiff  ;  but  the  question  of  damages  presented 
formidable  and,  to  his  mind,  insuperable  difficulties.  The 
damages,  if  any,  recoverable  here  must  be  either  the  estimated 
loss  of  profit  or  else  nominal  damages.  The  estimate  could  only 
be  based  on  a  succession  of  contingencies.  Thus  it  was  assumed 
that  (inter  alia)  Cyllene  would  be  alive  and  well  at  the  time  of  the 
intended  service  ;  that  the  mare  sent  would  be  well  bred  and  not 
barren  ;  that  she  would  not  slip  her  foal ;  and  that  the  foal  would 
be  born  alive  and  healthy.  In  a  case  of  this  kind  he  could  only 

1  Leibniz  notes  the  subtle    distinctions  made    by    Jurisconsults   between 
degrees  of  probability  ;  and  in  the  preface  to  a  work,  projected  but  unfinished, 
which  was  to  have  been  entitled  Ad  stater  am  juris  de  gradibus  probationum  et 
probabilitatum  he  recommends  them  as  models  of  logic  in  contingent  questions 
(Couturat,  Logiqne  de  Leibniz,  p.  240). 

2  I  have  considerably  compressed  the  original  report  (Sapwell  v.  Bass). 
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rely  on  the  weighing  of  chances  ;  and  the  law  generally  regarded 
damages  which  depended  on  the  weighing  of  chances  as  too 
remote,  and  therefore  irrecoverable.  It  was  drawing  the  line 
between  an  estimate  of  damage  based  on  probabilities,  as  in 
"  Simpson  v.  L.  and  N.W.  Railway  Co."  (1,  Q.B.D.,  274),  where 
Cockburn,  C.J.,  said  :  "  To  some  extent,  no  doubt,  the  damage 
must  be  a  matter  of  speculation,  but  that  is  no  reason  for  not 
awarding  any  damages  at  all,"  and  a  claim  for  damages  of  a 
totally  problematical  character.  He  (Mr.  Justice  Jelf)  thought 
the  present  case  was  well  over  the  line.  Having  referred  to 
"  Mayne  on  Damages"  (8th  ed.,  p.  70),  he  pointed  out  that 
in  "Watson  r.  Ambergah  Railway  Co."  (15,  Jur.,  448)  Patteson,  J., 
seemed  to  think  that  the  chance  of  a  prize  might  be  taken  into 
account  in  estimating  the  damages  for  breach  of  a  contract  to 
send  a  machine  for  loading  barges  by  railway  too  late  for  a  show  ; 
but  Krle,  J.,  appeared  to  think  such  damage  was  too  remote. 
In  his  Lordship's  view  the  chance  of  winning  a  prize  was  not  of 
sufficiently  ascertainablc  value  at  the  time  the  contract  was  made 
to  be  within  the  contemplation  of  the  parties.  Further,  in  the 
present  case,  the  contingencies  were  far  more  numerous  and 
uncertain.  He  would  enter  judgment  for  the  plaintiff  for  nominal 
damages,  which  were  all  he  was  entitled  to.  They  would  be 
assessed  at  Is. 

One  other  similar  case  may  be  quoted  in  further  elucidation 
of  the  same  point,  and  because  it  also  illustrates  another  point- 
the  importance  of  making  clear  the  assumptions  relative  to  which 
the  probability  is  calculated.  This  case  l  arose  out  of  an  offer  of 
a  Beauty  Prize  2  by  the  Daily  Express.  Out  of  6000  photographs 
submitted,  a  number  were  to  be  selected  and  published  in  the 
newspaper  in  the  following  manner  : 

The  United  Kingdom  was  to  be  divided  into  districts  and  the 
photographs  of  the  selected  candidates  living  in  each  district  were 
to  be  submitted  to  the  readers  of  the  paper  in  the  district,  who 
were  to  select  by  their  votes  those  whom  they  considered  the 
most  beautiful,  and  a  Mr.  Seymour  Hicks  was  then  to  make  an 
appointment  with  the  50  ladies  obtaining  the  greatest  number 
of  votes  and  himself  select  12  of  them.  The  plaintiff,  who  came 

1  Chaplin  r.  Hicks  (1011). 

2  The  prize  was  to  IK-  a  theatrical  engagement  and,  according  to  the  article, 
the  probability  of  subsequent  marriage  into  the  peerage. 
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out  head  of  one  of  the  districts,  submitted  that  she  had  not  been 
given  a  reasonable  opportunity  of  keeping  an  appointment,  that 
she  had  thereby  lost  the  value  of  her  chance  of  one  of  the  12 
prizes,  and  claimed  damages  accordingly.  The  jury  found  that 
the  defendant  had  not  taken  reasonable  means  to  give  the 
plaintiff  an  opportunity  of  presenting  herself  for  selection,  and 
assessed  the  damages,  provided  they  were  capable  of  assessment. 
at  £100,  the  question  of  the  possibility  of  assessment  being  post 
poned.  This  was  argued  before  Mr.  Justice  Pickford,  and  sub 
sequently  in  the  Court  of  Appeal  before  Lord  Justices  Vaughan 
Williams,  Fletcher  Moulton,  and  Farwell.  Two  questions  arose 
— -relative  to  what  evidence  ought  the  probability  to  be  cal 
culated,  and  was  it  numerically  measurable  ?  Counsel  for  the 
defendant  contended  that,  "  if  the  value  of  the  plaintiffs  chance 
was  to  be  considered,  it  must  be  the  value  as  it  stood  at  the  begin 
ning  of  the  competition,  not  as  it  stood  after  she  had  been  selected 
as  one  of  the  50.  As  6000  photographs  had  been  sent  in,  and  there 
was  also  the  personal  taste  of  the  defendant  as  final  arbiter  to 
be  considered,  the  value  of  the  chance  of  success  was  really  in 
calculable."  The  first  contention  that  she  ought  to  be  considered 
as  one  of  6000  not  as  one  of  50  was  plainly  preposterous  and  did 
not  hoodwink  the  court.  But  the  other  point,  the  personal 
taste  of  the  arbiter,  presented  more  difficulty.  In  estimating 
the  chance,  ought  the  Court  to  receive  and  take  account  of 
evidence  respecting  the  arbiter's  preferences  in  types  of  beauty  ? 
Mr.  Justice  Pickford,  without  illuminating  the  question,  held  that 
the  damages  were  capable  of  estimation.  Lord  Justice  Vaughan 
Williams  in  giving  judgment  in  the  Court  of  Appeal  argued  as 
follows  : 

As  he  understood  it,  there  were  some  50  competitors,  and 
there  were  12  prizes  of  equal  value,  so  that  the  average  chance 
of  success  was  about  one  in  four.  It  was  then  said  that  the 
questions  which  might  arise  in  the  minds  of  the  persons  who  had 
to  give  the  decisions  were  so  numerous  that  it  was  impossible  to 
apply  the  doctrine  of  averages.  He  did  not  agree.  Then  it 
was  said  that  if  precision  and  certainty  were  impossible  in  any 
case  it  would  be  right  to  describe  the  damages  as  unassessable. 
He  agreed  that  there  might  be  damages  so  unassessable  that  the 
doctrine  of  averages  was  not  possible  of  application  because  the 
figures  necessary  to  be  applied  were  not  forthcoming.  Several 
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cases  were  bo  be  found  in  the  reports  where  it  had  been  so  held, 
but  he  denied  the  proposition  that  because  precision  and  certainty 
had  not  been  arrived  at,  the  jury  had  no  function  or  duty  to 
determine  the  damages.  ...  He  (the  Lord  Justice)  denied  that 
the  mere  fact  that  you  could  not  assess  with  precision  and  cer 
tainty  relieved  a  wrongdoer  from  paying  damages  for  his  breach  of 
duty.  He  would  not  lay  down  that  in  every  case  it  could  be  left 
to  the  jury  to  assess  the  damages  ;  there  were  cases  where  the 
loss  was  so  dependent  on  the  mere  unrestricted  volition  of  another 
person  that  it  was  impossible  to  arrive  at  any  assessable  loss 
from  the  breach.  It  was  true  that  there  was  no  market  here  ; 
the  right  to  compete  was  personal  and  could  not  be  transferred. 
He  could  not  admit  that  a  competitor  who  found  herself  one  of 
50  could  have  gone  into  the  market  and  sold  her  right  to  compete. 
At  the  same  time  the  jury  might  reasonably  have  asked  them 
selves  the  question  whether,  if  there  was  a  right  to  compete,  it 
could  have  been  transferred,  and  at  what  price.  Under  these 
circumstances  he  thought  the  matter  was  one  for  the  jury. 

The  attitude  of  the  Lord  Justice  is  clear.  The  plaintiff  had 
evidently  suffered  damage,  and  justice  required  that  she  should 
be  compensated.  But  it  was  equally  evident,  that,  relative  to 
the  completest  information  available  and  account  being  taken  of 
the  arbiter's  personal  taste,  the  probability  could  be  by  no  means 
estimated  with  numerical  precision.  Further,  it  was  impossible 
to  say  how  much  weight  ought  to  be  attached  to  the  fact  that 
the  plaintiff  had  been  lend  of  her  district  (there  were  fewer  than 
50  districts)  ;  yet  it  was  plain  that  it  made  her  chance  better  than 
the  chances  of  those  of  the  50  left  in,  who  were  not  head  of  their 
districts.  Let  rough  justice  be  done,  therefore.  Let  the  case 
be  simplified  by  ignoring  some  part  of  the  evidence.  The 
'*  doctrine  of  averages  "  is  then  applicable,  or,  in  other  words, 
the  plaintiff's  loss  may  be  assessed  at  twelve-fiftieths  of  the 
value  of  the  prize.1 

G.  How  does  the  matter  stand,  then  ?  Whether  or  not  such 
a  thing  is  theoretically  conceivable,  no  exercise  of  the  practical 
judgment  is  possible,  by  which  a  numerical  value  can  actually 
be  given  to  the  probability  of  every  argument.  So  far  from 

1  The  jury  in  assessing  the  damages  at  £100,  however,  cannot  have  argued 
so  subtly  as  this  ;  for  the  average  value  of  a  prize  (I  have  omitted  the  details 
bearing  on  their  value)  could  not  have  hern  fairly  estimated  so  hL'h  as  £400. 
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our  being  able  to  measure  them,  it  is  not  even  clear  that  we  are 
always  able  to  place  them  in  an  order  of  magnitude.  Nor  has 
any  theoretical  rule  for  their  evaluation  ever  been  suggested. 

The  doubt,  in  view  of  these  facts,  whether  any  two  prob 
abilities  are  in  every  case  even  theoretically  capable  of  comparison 
in  terms  of  numbers,  has  not,  however,  received  serious  considera 
tion.  There  seems  to  me  to  be  exceedingly  strong  reasons  for 
entertaining  the  doubt.  Let  us  examine  a  few  more  instances. 

7.  Consider  an  induction  or  a  generalisation.  It  is  usually 
held  that  each  additional  instance  increases  the  generalisation's 
probability.  A  conclusion,  which  is  based  on  three  experiments 
in  which  the  unessential  conditions  are  varied,  is  more  trust 
worthy  than  if  it  were  based  on  two.  But  what  reason  or 
principle  can  be  adduced  for  attributing  a  numerical  measure  to 
the  increase  ?  1 

Or,  to  take  another  class  of  instances,  we  may  sometimes 
have  some  reason  for  supposing  that  one  object  belongs  to  a 
certain  category  if  it  has  points  of  similarity  to  other  known 
members  of  the  category  (e.g.  if  we  are  considering  whether 
a  certain  picture  should  be  ascribed  to  a  certain  painter),  and 
the  greater  the  similarity  the  greater  the  probability  of  our 
conclusion.  But  we  cannot  in  these  cases  measure  the  increase  ; 
we  can  say  that  the  presence  of  certain  peculiar  marks  in  a 
picture  increases  the  probability  that  the  artist  of  whom  those 
marks  are  known  to  be  characteristic  painted  it,  but  we  cannot 
say  that  the  presence  of  these  marks  makes  it  two  or  three  or 
any  other  number  of  times  more  probable  than  it  would  have 
been  without  them.  We  can  say  that  one  thing  is  more  like  a 
second  object  than  it  is  like  a  third  ;  but  there  will  very  seldom  be 
any  meaning  in  saying  that  it  is  twice  as  like.  Probability  is,  so 
far  as  measurement  is  concerned,  closely  analogous  to  similarity.2 

1  It  is  true  that  Laplace  and  others  (even  amongst  contemporary  writers) 
have  believed  that  the  probability  of  an  induction  is  measurable  by  means  of  a 
formula  known  as  the  rule  of  succession,  according  to  which  the  probability  of  an 

induction  based  on  n  instances  is  .     Those  who  have  been  convinced  by 

the  reasoning  employed  to  establish  this  rule  must  be  asked  to  postpone  judg 
ment  until  it  has  been  examined  in  Chapter  XXX.  But  we  may  point  out  here 
the  absurdity  of  supposing  that  the  odds  are  2  to  1  in  favour  of  a  generalisation 
based  on  a  single  instance — a  conclusion  which  this  formula  would  seem  to 
justify. 

2  There  are  very  few  writers  on  probability  who  have  explicitly  admitted 
that  probabilities,  though  in  some  sense  quantitative,   may  be  incapable  of 
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Or  consider  the  ordinary  circumstances  of  life.  We  are  out 
for  a  walk— what  is  the  probability  that  we  shall  reach  home 
alive  >  Has  this  always  a  numerical  measure  ?  If  a  thunder 
storm  bursts  upon  us,  the  probability  is  less  than  it  was  before  ; 
but  is  it  changed  by  some  definite  numerical  amount  ?  There 
might,  of  course,  be  data  which  would  make  these  probabilities 
numerically  comparable  ;  it  might  be  argued  that  a  knowledge 
of  the  statistics  of  death  by  lightning  would  make  such  a  com 
parison  possible.  But  if  such  information  is  not  included  within 
the  knowledge  to  which  the  probability  is  referred,  this  fact  is 
not  relevant  to  the  probability  actually  in  question  and  cannot 
affect  its  value.  In  some  cases,  moreover,  where  general  statistics 
are  available,  the  numerical  probability  which  might  be  derived 
from  them  is  inapplicable  because  of  the  presence  of  additional 
knowledge  with  regard  to  the  particular  case.  Gibbon  cal 
culated  his  prospects  of  life  from  the  volumes  of  vital  statistics 
and  the  calculations  of  actuaries.  But  if  a  doctor  had  been  called 
to  his  assistance  the  nice  precision  of  these  calculations  would 
have  become  useless  ;  Gibbon's  prospects  would  have  been  better 
or  worse  than  before,  but  he  would  no  longer  have  been  able  to 
calculate  to  within  a  day  or  week  the  period  for  which  he  then 
possessed  an  even  chance  of  survival. 

In  these  instances  we  can,  perhaps,  arrange  the  probabilities 
in  an  order  of  magnitude  and  assert  that  the  new  datum 
strengthens  or  weakens  the  argument,  although  there  is  no 
basis  for  an  estimate  how  much  stronger  or  weaker  the  new 
argument  is  than  the  old.  But  in  another  class  of  instances  is 
it  even  possible  to  arrange  the  probabilities  in  an  order  of  magni 
tude,  or  to  say  that  one  is  the  greater  and  the  other  less  ? 

8.  Consider  three  sets  of  experiments,  each  directed  towards 
establishing  a  generalisation.  The  first  set  is  more  numerous  ; 

numerical  comparison.  Kdireworth,  "  Philosophy  of  Chance  "  (Mind,  1884,  p. 
225j.  admitted  that  "  there  may  well  be  important  quantitative,  although  not 
numerical,  estimates  "  of  probabilities.  Goldschmidt  (Wahrscheinlichkeilsrech- 
nu7i'j,  p.  4:{)  may  also  be  cited  as  holding  a  somewhat  similar  opinion.  Jlc 
maintains  that  a  lark  of  comparability  in  the  grounds  often  stands  in  the  way 
of  the  measurability  of  the  probable  in  ordinary  usage,  and  that  there  are  not 
necessarily  good  reasons  for  measuring  the  value  of  one  argument  against 
that  of  another.  On  the  other  hand,  a  numerical  statement  for  the  degree  of  the 
probable,  although  generally  impossible,  is  not  in  itself  contradictory  to  the 
notion  ;  and  of  three  statements,  relating  to  the  same  circumstances,  we  can 
well  sav  that  one  is  more  probable  than  another,  and  that  one  is  the  most 
probable  <.f  the  three. 
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in  the  second  set  the  irrelevant  conditions  have  been  more 
carefully  varied  ;  in  the  third  case  the  generalisation  in  view 
is  wider  in  scope  than  in  the  others.  Which  of  these  generalisa 
tions  is  on  such  evidence  the  most  probable  ?  There  is,  surely, 
no  answer  ;  there  is  neither  equality  nor  inequality  between 
them.  We  cannot  always  weigh  the  analogy  against  the  induc 
tion,  or  the  scope  of  the  generalisation  against  the  bulk  of  the 
evidence  in  support  of  it.  If  we  have  more  grounds  than 
before,  comparison  is  possible  ;  but,  if  the  grounds  in  the  two 
cases  are  quite  different,  even  a  comparison  of  more  and  less, 
let  alone  numerical  measurement,  may  be  impossible. 

This  leads  up  to  a  contention,  which  I  have  heard  supported, 
that,  although  not  all  measurements  and  not  all  comparisons  of 
probability  are  within  our  power,  yet  we  can  say  in  the  case  of 
every  argument  whether  it  is  more  or  less  likely  than  not.  Is  our 
expectation  of  rain,  when  we  start  out  for  a  walk,  always  more 
likely  than  not,  or  less  likely  than  not,  or  as  likely  as  not  ?  I  am 
prepared  to  argue  that  on  some  occasions  none  of  these  alternatives 
hold,  and  that  it  will  be  an  arbitrary  matter  to  decide  for  or 
against  the  umbrella.  If  the  barometer  is  high,  but  the  clouds  are 
black,  it  is  not  always  rational  that  one  should  prevail  over  the 
other  in  our  minds,  or  even  that  we  should  balance  them,- 
though  it  will  be  rational  to  allow  caprice  to  determine  us  and 
to  waste  no  time  on  the  debate. 

9.  Some  cases,  therefore,  there  certainly  are  in  which  no 
rational  basis  has  been  discovered  for  numerical  comparison.  It 
is  not  the  case  here  that  the  method  of  calculation,  prescribed 
by  theory,  is  beyond  our  powers  or  too  laborious  for  actual 
application.  No  method  of  calculation,  however  impracticable, 
has  been  suggested.  Nor  have  we  any  prima  facie  indications  of 
the  existence  of  a  common  unit  to  which  the  magnitudes  of  all 
probabilities  are  naturally  referrible.  A  degree  of  probability 
is  not  composed  of  some  homogeneous  material,  and  is  not 
apparently  divisible  into  parts  of  like  character  with  one 
another.  An  assertion,  that  the  magnitude  of  a  given  prob 
ability  is  in  a  numerical  ratio  to  the  magnitude  of  every 
other,  seems,  therefore,  unless  it  is  based  on  one  of  the  current 
definitions  of  probability,  with  which  I  shall  deal  separately 
in  later  chapters,  to  be  altogether  devoid  of  the  kind  of  support, 
which  can  usually  be  supplied  in  the  case  of  quantities  of  which 
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the  meusurability    is   not  open   to  denial.      It  will   be    worth 
while,  however,  to  pursue  the  argument  a  little  further. 

10.  There  appear  to  be   four  alternatives.      Either  in  some 
cases  there  is  no  probability  at  all  ;    or  probabilities  do   not  all 
belong  to  a  single  set  of  magnitudes  measurable  in  terms  of  a 
common   unit ;    or  these   measures  always  exist,  but  in  many 
cases  are,   and    must   remain,   unknown  ;     or    probabilities   do 
belong  to  such  a  set  and  their  measures  are  capable,  of  being 
determined    by   us,   although   we    are   not   always    able    so   to 
determine  them  in  practice. 

11.  Laplace  and  his  followers  excluded  the  first  two  alter 
natives.      They  argued  that  every  conclusion  has  its  place   in 
the  numerical  range  of  probabilities  from  0  to  1,  if  only  U'c  knew 
it,  and  they  developed  their  theory  of  unknown  probabilities. 

In  dealing  with  this  contention,  we  must  be  clear  as  to  what 
we  mean  by  saying  that  a  probability  is  unknown.  Do  we  mean 
unknown  through  lack  of  skill  in  arguing  from  given  evidence, 
or  unknown  through  lack  of  evidence  ?  The  first  is  alone 
admissible,  for  new  evidence  would  give  us  a  new  probability, 
not  a  fuller  knowledge  of  the  old  one  ;  we  have  not  discovered 
the  probability  of  a  statement  on  given  evidence,  by  determining 
its  probability  in  relation  to  quite  different  evidence.  We  must 
not  allow  the  theory  of  unknown  probabilities  to  gain  plausibility 
from  the  second  sense.  A  relation  of  probability  does  not  yield 
us,  as  a  rule,  information  of  much  value,  unless  it  invests  the 
conclusion  with  a  probability  which  lies  between  narrow  numerical 
limits.  In  ordinary  practice,  therefore,  we  do  not  always  regard 
ourselves  as  knowing  the  probability  of  a  conclusion,  unless  we 
caii  estimate  it  numerically.  We  are  apt,  that  is  to  say,  to 
restrict  the  use  of  the  expression  probable  to  these  numerical 
examples,  and  to  allege  in  other  cases  that  the  probability  is 
unknown.  We  might  say,  for  example,  that  we  do  not  know, 
when  we  go  on  a  railway  journey,  the  probability  of  death  in  a 
railway  accident,  unless  we  are  told  the  statistics  of  accidents 
in  former  years  ;  or  that  we  do  not  know  our  chances  in  a  lottery, 
unless  we  are  told  the  number  of  the  tickets,  lint  it  must  be 
clear  upon  reflection  that  if  we  use  the  term  in  this  sense,-  -which 
is  no  doubt  a  perfectly  legitimate;  sense,  — we  ought  to  say  that 
in  the  case  of  some  arguments  a  relation  of  probability  does  not 
exist,  and  not  that  it  is  unknown.  For  it  is  not  this  probability 
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that  we  have  discovered,  wheii  the  accession  of  new  evidence 
makes  it  possible  to  frame  a  numerical  estimate. 

Possibly  this  theory  of  unknown  probabilities  may  also  gain 
strength  from  our  practice  of  estimating  arguments,  which,  as 
I  maintain,  have  no  numerical  value,  by  reference  to  those  that 
have.  We  frame  two  ideal  arguments,  that  is  to  say,  in  which 
the  general  character  of  the  evidence  largely  resembles  what  is 
actually  within  our  knowledge,  but  which  is  so  constituted  as 
to  yield  a  numerical  value,  and  we  judge  that  the  probability  of 
the  actual  argument  lies  between  these  two.  Since  our  standards, 
therefore,  are  referred  to  numerical  measures  in  many  cases 
where  actual  measurement  is  impossible,  and  since  the  probability 
lies  between  two  numerical  measures,  we  come  to  believe  that  it 
must  also,  if  only  we  knew  it,  possess  such  a  measure  itself. 

12.  To  say,  then,  that  a  probability  is  unknown  ought  to 
mean  that  it  is  unknown  to  us  through  our  lack  of  skill  in  arguing 
from  given  evidence.  The  evidence  justifies  a  certain  degree  of 
knowledge,  but  the  weakness  of  our  reasoning  power  prevents  our 
knowing  what  this  degree  is.  At  the  best,  in  such  cases,  we  only 
know  vaguely  with  what  degree  of  probability  the  premisses  invest 
the  conclusion.  That  probabilities  can  be  unknown  in  this  sense 
or  known  with  less  distinctness  than  the  argument  justifies, 
is  clearly  the  case.  We  can  through  stupidity  fail  to  make  any 
estimate  of  a  probability  at  all,  just  as  we  may  through  the 
same  cause  estimate  a  probability  wrongly.  As  soon  as  we 
distinguish  between  the  degree  of  belief  which  it  is  rational  to 
entertain  and  the  degree  of  belief  actually  entertained,  we  have 
in  effect  admitted  that  the  true  probability  is  not  known  to 
everybody. 

But  this  admission  must  not  be  allowed  to  carry  us  too  far. 
Probability  is,  vide  Chapter  II.  (§  12),  relative  in  a  sense  to  the 
principles  of  human  reason.  The  degree  of  probability,  which 
it  is  rational  for  us  to  entertain,  does  not  presume  perfect  logical 
insight,  and  is  relative  in  part  to  the  secondary  propositions 
which  we  in  fact  know  ;  and  it  is  not  dependent  upon  whether 
more  perfect  logical  insight  is  or  is  not  conceivable.  It  is  the 
degree  of  probability  to  which  those  logical  processes  lead,  of 
which  our  minds  are  capable  ;  or,  in  the  language  of  Chapter  II., 
which  those  secondary  propositions  justify,  which  we  in  fact  know. 
If  we  do  not  take  this  view  of  probability,  if  we  do  not  limit  it 
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in  this  way  and  make  it,  to  this  extent,  relative  to  human 
powers,  we  are  altogether  adrift  in  the  unknown  ;  for  we  cannot 
ever  know  what  degree  of  probability  would  be  justified  by  the 
perception  of  logical  relations  which  we  are,  and  must  always  be, 
incapable  of  comprehending. 

13.  Those  who  have  maintained  that,  where  we  cannot  assign 
a  numerical  probability,  this  is  not  because  there  is  none,  but 
simply  because  we  do  not  know  it,  have  really  meant,  I  feel 
sure,  that  with  some  addition  to  our  knowledge  a  numerical 
value  would  be  assignable,  that  is  to  say  that  our  conclusions 
would  have  a  numerical  probability  relative  to  slightly  different 
premisses.  Unless,  therefore,  the  reader  clings  to  the  opinion 
that,  in  every  one  of  the  instances  I  have  cited  in  the  earlier 
paragraphs  of  this  chapter,  it  is  theoretically  possible  on  that 
evidence  to  assign  a  numerical  value  to  the  probability,  we  are 
left  with  the  first  two  of  the  alternatives  of  §  10,  which  were 
as  follows  :  either  in  some  cases  there  is  no  probability  at  all  ; 
or  probabilities  do  not  all  belong  to  a  single  set  of  magnitudes 
measurable  in  terms  of  a  common  unit.  It  would  be  difficult  to 
maintain  that  there  is  no  logical  relation  whatever  between 
our  premiss  and  our  conclusion  in  those  cases  where  we  cannot 
assign  a  numerical  value  to  the  probability  ;  and  if  this  is  so, 
it  is  really  a  question  of  whether  the  logical  relation  has  char 
acteristics,  other  than  mensurability,  of  a  kind  to  justify  us  in 
calling  it  a  probability-relation.  Which  of  the  two  we  favour  is, 
therefore,  partly  a  matter  of  definition.  We  might,  that  is  to 
say,  pick  out  from  probabilities  (in  the  widest  sense)  a  set,  if  there 
is  one,  all  of  which  are  measurable  in  terms  of  a  common  unit, 
and  call  the  members  of  this  set,  and  them  only,  probabilities  (in 
the  narrow  sense).  To  restrict  the  term  '  probability  '  in  this 
way  would  be,  I  think,  very  inconvenient.  For  it  is  possible, 
as  I  shall  show,  to  find  sereml  sets,  the  members  of  each  of 
which  are  measurable  in  terms  of  a  unit  common  to  all  the 
members  of  that  set ;  so  that  it  would  be  in  some  degree 
arbitrary  l  which  we  chose.  Further,  the  distinction  between 
probabilities,  which  would  be  thus  measurable  and  those  which 
would  not,  is  not  fundamental. 

At  any  rate  I  aim  here  at  dealing  with  probability  in  its 

1  Not  altogether  ;   for  it   would   bo  natural   to  select  the  set  to  which  the 
relation  of  certainty  belongs. 
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widest  sense,  and  am  averse  to  confining  its  scope  to  a  limited 
type  of  argument.  If  the  opinion  that  not  all  probabilities  can 
be  measured  seems  paradoxical,  it  may  be  due  to  this  divergence 
from  a  usage  which  the  reader  may  expect.  Common  usage, 
even  if  it  involves,  as  a  rule,  a  flavour  of  numerical  measurement, 
does  not  consistently  exclude  those  probabilities  which  are  in 
capable  of  it.  The  confused  attempts,  which  have  been  made, 
to  deal  with  numerically  indeterminate  probabilities  under  the 
title  of  unknown  probabilities,  show  how  difficult  it  is  to 
confine  the  discussion  within  the  intended  limits,  if  the  original 
definition  is  too  narrow. 

14.  I  maintain,  then,  in  what  follows,  that  there  are  some  pairs 
of  probabilities  between  the  members  of  which  no  comparison 
of  magnitude  is  possible  ;  that  we  can  say,  nevertheless,  of  some 
pairs  of  relations  of  probability  that  the  one  is  greater  and  the 
other  less,  although  it  is  not  possible  to  measure  the  difference 
between  them  ;  and  that  in  a  very  special  type  of  case,  to  be 
dealt  with  later,  a  meaning  can  be  given  to  a  numerical  comparison 
of  magnitude.  I  think  that  the  results  of  observation,  of  which 
examples  have  been  given  earlier  in  this  chapter,  are  consistent 
with  this  account. 

By  saying  that  not  all  probabilities  are  measurable,  I  mean 
that  it  is  not  possible  to  say  of  every  pair  of  conclusions,  about 
which  we  have  some  knowledge,  that  the  degree  of  our  rational 
belief  in  one  bears  any  numerical  relation  to  the  degree  of  our 
rational  belief  in  the  other ;  and  by  saying  that  not  all  proba 
bilities  are  comparable  in  respect  of  more  and  less,  I  mean  that 
it  is  not  always  possible  to  say  that  the  degree  of  our  rational 
belief  in  one  conclusion  is  either  equal  to,  greater  than,  or  less 
than  the  degree  of  our  belief  in  another. 

We  must  now  examine  a  philosophical  theory  of  the  quanti 
tative  properties  of  probability,  which  would  explain  and 
justify  the  conclusions,  which  reflection  discovers,  if  the  preceding 
discussion  is  correct,  in  the  practice  of  ordinary  argument.  We 
must  bear  in  mind  that  our  theory  must  apply  to  all  probabilities 
and  not  to  a  limited  class  only,  and  that,  as  we  do  not  adopt  a 
definition  of  probability  which  presupposes  its  numerical  men- 
surability,  we  cannot  directly  argue  from  differences  in  degree 
to  a  numerical  measurement  of  these  differences.  The  problem 
is  subtle  and  difficult,  and  the  following  solution  is,  therefore, 
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proposed  with  hesitation  ;    but  I  am  strongly  convinced  that 
something  resembling  the  conclusion  here  set  forth  is  true. 

15.  The  so-called  magnitudes  or  degrees  of  knowledge  or 
probability,  in  virtue  of  which  one  is  greater  and  another  less, 
really  arise  out  of  an  order  in  which  it  is  possible  to  place  them. 
Certainty,  impossibility,  and  a  probability,  which  has  an  inter 
mediate  value,  for  example,  constitute  an  ordered  series  in  which 
the  probability  lies  between  certainty  and  impossibility.  In  the 
same  way  there  may  exist  a  second  probability  which  lies  between 
certainty  and  the  first  probability.  When,  therefore,  we  say  that 
one  probability  is  greater  than  another,  this  precisely  means  that 
the  degree  of  our  rational  belief  in  the  first  case  lies  betiveen 
certainty  and  the  degree  of  our  rational  belief  in  the  second  case. 

On  this  theory  it  is  easy  to  see  why  comparisons  of  more 
and  less  are  not  always  possible.  They  exist  between  two  proba 
bilities,  only  when  they  and  certainty  all  lie  on  the  same  ordered 
series.  But  if  more  than  one  distinct  series  of  probabilities 
exist,  then  it  is  clear  that  only  those,  which  belong  to  the  same 
series,  can  be  compared.  If  the  attribute  '  greater  '  as  applied 
to  one  of  two  terms  arises  solely  out  of  the  relative  order  of  the 
terms  in  a  series,  then  comparisons  of  greater  and  less  must 
always  be  possible  between  terms  which  are  members  of  the 
same  series,  and  can  never  be  possible  between  two  terms  which 
are  not  members  of  the  same  series.  Some  probabilities  are  not 
comparable  in  respect  of  more  and  less,  because  there  exists 
more  than  one  path,  so  to  speak,  between  proof  and  disproof, 
between  certainty  and  impossibility  ;  and  neither  of  two  proba 
bilities,  which  lie  on  independent  paths,  bears  to  the  other  and 
to  certainty  the  relation  of  '  between  '  which  is  necessary  for 
quantitative  comparison. 

If  we  are  comparing  the  probabilities  of  two  arguments, 
where  the  conclusion  is  the  same  in  both  and  the  evidence  of 
one  exceeds  the  evidence  of  the  other  by  the  inclusion  of  some 
fact  which  is  favourably  relevant,  in  such  a  case  a  relation  seems 
clearly  to  exist  between  the  two  in  virtue  of  which  one  lies 
nearer  to  certainty  than  the  other.  Several  types  of  argument 
can  be  instanced  in  which  the  existence  of  such  a  relation  is 
equally  apparent.  But  we  cannot  assume  its  presence  in  every 
case  or  in  comparing  in  respect  of  more  and  less  the  probabilities 
of  every  pair  of  arguments. 
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16.  Analogous  instances  are  by  no  means  rare,  in  which,  by  a 
convenient  looseness,  the  phraseology  of  quantity  is  misapplied 
in  the  same  manner  as  in  the  case  of  probability.  The  simplest 
example  is  that  of  colour.  When  we  describe  the  colour  of 
one  object  as  bluer  than  that  of  another,  or  say  that  it  has  more 
green  in  it,  we  do  not  mean  that  there  are  quantities  blue  and 
green  of  which  the  object's  colour  possesses  more  or  less  ;  we 
mean  that  the  colour  has  a  certain  position  in  an  order  of  colours 
and  that  it  is  nearer  some  standard  colour  than  is  the  colour 
with  which  we  compare  it. 

Another  example  is  afforded  by  the  cardinal  numbers.  We 
say  that  the  number  three  is  greater  than  the  number  two,  but 
we  do  not  mean  that  these  numbers  are  quantities  one  of  which 
possesses  a  greater  magnitude  than  the  other.  The  one  is 
greater  than  the  other  by  reason  of  its  position  in  the  order  of 
numbers  ;  it  is  further  distant  from  the  origin  zero.  One  number 
is  greater  than  another  if  the  second  number  lies  between  zero 
and  the  first. 

But  the  closest  analogy  is  that  of  similarity.  WThen  we  say 
of  three  objects  A,  B,  and  C  that  B  is  more  like  A  than  C  is,  we 
mean,  not  that  there  is  any  respect  in  which  B  is  in  itself  quan 
titatively  greater  than  C,  but  that,  if  the  three  objects  are  placed 
in  an  order  of  similarity,  B  is  nearer  to  A  than  C  is.  There  are 
also,  as  in  the  case  of  probability,  different  orders  of  similarity. 
For  instance,  a  book  bound  in  blue  morocco  is  more  like  a  book 
bound  in  red  morocco  than  if  it  were  bound  in  blue  calf  ;  and  a 
book  bound  in  red  calf  is  more  like  the  book  in  red  morocco  than 
if  it  were  in  blue  calf.  But  there  may  be  no  comparison  between 
the  degree  of  similarity  which  exists  between  books  bound  in 
red  morocco  and  blue  morocco,  and  that  which  exists  between 
books  bound  in  red  morocco  and  red  calf.  This  illustration 
deserves  special  attention,  as  the  analogy  between  orders  of 
similarity  and  probability  is  so  great  that  its  apprehension  will 
greatly  assist  that  of  the  ideas  I  wish  to  convey.  We  say 
that  one  argument  is  more  probable  than  another  (i.e.  nearer  to 
certainty)  in  the  same  kind  of  way  as  we  can  describe  one  object 
as  more  like  than  another  to  a  standard  object  of  comparison. 

17.  Nothing  has  been  said  up  to  this  point  which  bears  on 
the  question  whether  probabilities  are  ever  capable  of  numerical 
comparison.  It  is  true  of  some  types  of  ordered  series  that 
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there  are  measurable  relations  of  distance  between  their  members 
as  well  as  order,  and  that  the  relation  of  one  of  its  members 
to  an  '  origin  '  can  be  numerically  compared  with  the  relation 
of  another  member  to  the  same  origin.  But  the  legitimacy  of 
such  comparisons  must  be  matter  for  special  enquiry  in  each 
case. 

It  will  not  be  possible  to  explain  in  detail  how  and  in  what 
sense  a  meaning  can  sometimes  be  given  to  the  numerical  measure 
ment  of  probabilities  until  Part  II.  is  reached.  But  this  chapter 
will  be  more  complete  if  I  indicate  briefly  the  conclusions  at 
which  we  shall  arrive  later.  It  will  be  shown  that  a  process 
of  compounding  probabilities  can  be  defined  with  such  properties 
that  it  can  be  conveniently  called  a  process  of  addition.  It  will 
sometimes  be  the  case,  therefore,  that  we  can  say  that  one 
probability  C  is  equal  to  the  sum  of  two  other  probabilities  A 
and  B,  i.e.  C  =  A  +  B.  If  in  such  a  case  A  and  B  are  equal,  then 
we  may  write  this  C  =  2 A  and  say  that  C  is  double  A.  Similarly 
if  D  =  C  +  A,  we  may  write  D  =  3 A,  and  so  on.  We  can  attach  a 
meaning,  therefore,  to  the  equation  P  =  w.A,  where  P  and  A  are 
relations  of  probability,  and  n  is  a  number.  The  relation  of 
certainty  has  been  commonly  taken  as  the  unit  of  such  con 
ventional  measurements.  Hence  if  P  represents  certainty, 
we  should  say,  in  ordinary  language,  that  the  magnitude  of  the 
probability  A  is  [.  It  will  be  shown  also  that  we  can  define  a 
process,  applicable  to  probabilities,  which  has  the  properties  of 
arithmetical  multiplication.  Where  numerical  measurement  is 
possible,  we  can  in  consequence  perform  algebraical  operations 
of  considerable  complexity.  The  attention,  out  of  proportion 
to  their  real  importance,  which  has  been  paid,  on  account  of  the 
opportunities  of  mathematical  manipulation  which  they  afford, 
to  the  limited  class  of  numerical  probabilities,  seems  to  be 
a  part  explanation  of  the  belief,  which  it  is  the  principal  object 
of  this  chapter  to  prove  erroneous,  that  all  probabilities  must 
belong  to  it. 

18.  We  must  look,  then,  at  the  quantitative  characteristics  of 
probability  in  the  following  way.  Some  sets  of  probabilities 
we  can  place  in  an  ordered  series,  in  which  we  can  say  of  any 
pair  that  one  is  nearer  than  the  other  to  certainty,  that  the 
argument  in  one  case  is  nearer  proof  than  in  the  other,  and  that 
there  is  more  reason  for  one  conclusion  than  for  the  other.  But 
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we  can  only  build  up  these  ordered  series  in  special  cases.  If  we 
are  given  two  distinct  arguments,  there  is  no  general  presump 
tion  that  their  two  probabilities  and  certainty  can  be  placed 
in  an  order.  The  burden  of  establishing  the  existence  of  such 
an  order  lies  on  us  in  each  separate  case.  An  endeavour  will 
be  made  later  to  explain  in  a  systematic  way  how  and  in 
what  circumstances  such  orders  can  be  established.  The 
argument  for  the  theory  here  proposed  will  then  be  strengthened. 
For  the  present  it  has  been  shown  to  be  agreeable  to  common 
sense  to  suppose  that  an  order  exists  in  some  cases  and  not  in 
others. 

19.  Some  of  the  principal  properties   of  ordered   series  of 
probabilities  are  as  follows  : 

(i.)  Every  probability  lies  on  a  path  between  impossibility 
and  certainty  ;  it  is  always  true  to  say  of  a  degree 
of  probability,  which  is  not  identical  either  with 
impossibility  or  with  certainty,  that  it  lies  between 
them.  Thus  certainty,  impossibility  and  any  other 
degree  of  probability  form  an  ordered  series.  This 
is  the  same  thing  as  to  say  that  every  argument 
amounts  to  proof,  or  disproof,  or  occupies  an  inter 
mediate  position. 

(ii.)  A  path  or  series,  composed  of  degrees  of  probability, 
is  not  in  general  compact.  It  is  not  necessarily  true, 
that  is  to  say,  that  any  pair  of  probabilities  in  the 
same  series  have  a  probability  between  them. 

(iii.)  The  same  degree  of  probability  can  lie  on  more  than 
one  path  (i.e.  can  belong  to  more  than  one  series). 
Hence,  if  B  lies  between  A  and  C,  and  also  lies  between 
A'  and  C',  it  does  not  follow  that  of  A  and  A'  either  lies 
between  the  other  and  certainty.  The  fact,  that  the 
same  probability  can  belong  to  more  than  one  distinct 
series,  has  its  analogy  in  the  case  of  similarity. 

(iv.)  If  ABC  forms  an  ordered  series,  B  lying  between  A 
and  C,  and  BCD  forms  an  ordered  series,  C  lying  between 
B  and  D,  then  ABCD  forms  an  ordered  series,  B  lying 
between  A  and  D. 

20.  The  different  series  of  probabilities  and  their  mutual  rela 
tions  can  be  most  easily  pictured  by  means  of  a  diagram.    Let  us 
represent  an  ordered  series  by  points  lying  upon  a  path,  all  the 
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points  on  a  given  path  belonging  to  the  same  series.  It  follows 
from  (i.)  that  the  points  0  and  I,  representing  the  relations  of 
impossibility  and  certainty,  lie  on  every  path,  and  that  all  paths 
lie  wholly  between  these  points.  It  follows  from  (iv.)  that  the 
same  point  can  lie  on  more  than  one  path.  It  is  possible,  there 
fore,  for  paths  to  intersect  and  cross.  It  follows  from  (iv.)  that 
the  probability  represented  by  a  given  point  is  greater  than  that 
represented  by  any  other  point  which  can  be  reached  by  passing 
along  a  path  with  a  motion  constantly  towards  the  point  of 
impossibility,  and  less  than  that  represented  by  any  point  which 
can  be  reached  by  moving  along  a  path  towards  the  point  of 
certainty.  As  there  are  independent  paths  there  will  be  some 
pairs  of  points  representing  relations  of  probability  such  that  we 
cannot  roach  one  by  moving  from  the  other  along  a  path  always 
in  the  same  direction. 

These  properties  are  illustrated  in  the  annexed  diagram. 
0  represents  impossibility,  I  certainty,  and  A  a  numerically 
measurable  probability  inter 
mediate  between  0  and  I  ;  U, 
V,  W,  X,  Y,  Z  are  non-numerical 
probabilities,  of  which,  however, 
V  is  less  than  the  numerical 
probability  A,  and  is  also  less 
than  W,  X.  and  Y.  X  and  Y 

are  both  greater  than  W,  and  greater  than  V,  but  are  not 
comparable  with  one  another,  or  with  A.  V  and  Z  are  both 
less  than  W,  X,  and  Y,  but  are  not  comparable  with  one 
another  ;  U  is  not  quantitatively  comparable  with  any  of  the 
probabilities  V,  W,  X,  Y,  Z.  Probabilities  which  are  numerically 
comparable  will  all  belong  to  one  series,  and  the  path  of  this 
series,  which  we  may  call  the  numerical  path  or  strand,  will  be 
represented  by  OAI. 

21.    The   chief  results  which  have    been   reached   so   far  are 
collected  together  below,  and  expressed  with  precision  : — 

(i.)  There  are  amongst  degrees  of  probability  or  rational 
belief  various  sets,  each  set  composing  an  ordered 
series.  These  series  are  ordered  by  virtue  of  a  relation 
of  '  between.'  If  Ji  is  '  between  '  A  and  C,  ABC  form  a 
series. 
(ii.)  There  are  two  degrees  of  probability  O  and  I  between 
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which  all  other  probabilities  lie.     If,  that  is  to  say,  A 
is  a  probability,  OAI  form  a  series.     0  represents  im 
possibility  and  I  certainty, 
(iii.)  If  A  lies  between  0  and  B,  we  may  write  this  AB, 

so  that  OA  and  AI  are  true  for  all  probabilities. 

(iv.)  If  AB,  the  probability  B  is  said  to  be  greater  than 

the  probability  A,  and  this  can  be  expressed  by  B>  A. 

(v.)  If  the  conclusion  a  bears  the  relation  of  probability 

P  to  the  premiss  /*,  or  if,  in  other  words,  the  hypothesis 

h  invests  the  conclusion  a  with  probability  P,  this  may 

be  written  aPh.     It  may  also  be  written  a/h='P. 

This  latter  expression,  which  proves  to  be  the  more  useful  of  the 

two  for  most  purposes,  is  of  fundamental  importance.      If  dPh 

and    a'Ph',  i.e.   if  the   probability  of   a   relative  to    h  is  the 

same  as  the  probability  of  a'  relative  to  hr,  this  may  be  written 

a/h=a'/h'.      The   value   of   the   symbol  a/h,  which  represents 

what  is  called  by  other  writers  '  the  probability  of  «/  lies  in 

the  fact  that  it  contains  explicit  reference  to  the  data  to  which 

the  probability  relates  the  conclusion,  and  avoids  the  numerous 

errors  which  have  arisen  out  of  the  omission  of  this  reference. 


CHAPTER  IV 

THE    PRINCIPLE    OF   INDIFFERENCE 

ABSOLUTE.     '  Sure,  Sir,  this  is  not  very  reasonable,  to  summon  my  affection 

for  a  lady  I  know  nothing  of.' 
SIR  ANTHONY.     '  I   am  sure,  Sir,  'tis  more  unreasonable   in  you    to    object 

to  a  lady  you  know  nothing  of.'  l 

1.  IN  the  last  chapter  it  was  assumed  that  in  some  cases  the 
probabilities  of  two  arguments  may  be  equal.  It  was  also  argued 
that  there  are  other  cases  in  which  one  probability  is,  in  some 
sense,  greater  than  another.  But  so  far  there  has  been  nothing 
to  show  how  we  are  to  know  when  two  probabilities  are  equal  or 
unequal.  The  recognition  of  equality,  when  it  exists,  will  be 
dealt  with  in  this  chapter,  and  the  recognition  of  inequality  in 
the  next.  An  historical  account  of  the  various  theories  about 
this  problem,  which  have  been  held  from  time  to  time,  will  be 
given  in  Chapter  VII. 

2.  The  determination  of  equality  between  probabilities  has 
received  hitherto  much  more  attention  than  the  determination 
of  inequality.  This  has  been  due  to  the  stress  which  has  been 
laid  on  the  mathematical  side  of  the  subject.  In  order  that 
numerical  measurement  may  be  possible,  we  must  be  given  a 
number  of  equally  probable  alternatives.  The  discovery  of  a 
rule,  by  which  equiprobability  could  be  established,  was,  there 
fore,  essential.  A  rule,  adequate  to  the  purpose,  introduced  by 
James  Bernoulli,  who  was  the  real  founder  of  mathematical 
probability,2  has  been  widely  adopted,  generally  under  the 
title  of  The  Principle  of  Non-Sufficient  Reason,  down  to  the 
present  time.  This  description  is  clumsy  and  unsatisfactory, 
and,  if  it  is  justifiable  to  break  away  from  tradition,  1  prefer  to 
call  it  The  Principle  of  Indifference. 

1  Quoted  by  Mr.  IJominquet  with  reference  to  the  Principle  of  Non-Suflicient 
Reason.  2  See  also  Chap.  VII. 
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The  Principle  of  Indifference  asserts  that  if  there  is  no  known 
reason  for  predicating  of  our  subject  one  rather  than  another  of 
several  alternatives,  then  relatively  to  such  knowledge  the 
assertions  of  each  of  these  alternatives  have  an  equal  probability. 
Thus  equal  probabilities  must  be  assigned  to  each  of  several 
arguments,  if  there  is  an  absence  of  positive  ground  for  assigning 
unequal  ones. 

This  rule,  as  it  stands,  may  lead  to  paradoxical  and  even 
contradictory  conclusions.  I  propose  to  criticise  it  in  detail, 
and  then  to  consider  whether  any  valid  modification  of  it  is 
discoverable.  For  several  of  the  criticisms  which  follow  I  am 
much  indebted  to  Von  Kries's  Die  Principien  der  Wahrschein- 
lichkeit.1 

3.  If  every  probability  was  necessarily  either  greater  than, 
equal  to,  or  less  than  any  other,  the  Principle  of  Indifference 
would  be  plausible.     For  if  the  evidence  affords  no  ground  for 
attributing  unequal  probabilities  to  the  alternative  predications, 
it  seems  to  follow  that  they  must  be  equal.     If,  on  the  other  hand, 
there  need  be  neither  equality  nor  inequality  between  prob 
abilities,  this  method  of  reasoning  fails.     Apart,  however,  from 
this  objection,  which  is  based  on  the  arguments  of  Chapter  III., 
the  plausibility  of  the  principle  will  be  most  easily  shaken  by  an 
exhibition  of  the  contradictions  which  it  involves.     These  fall 
under  three  or  four  distinct  heads.     In  §§  4-9  my  criticism  will 
be  purely  destructive,  and  I  shall  not  attempt  in  these  paragraphs 
to  indicate  my  own  way  out  of  the  difficulties. 

4.  Consider  a  proposition,  about  the  subject  of  which  we  know 
only  the  meaning,  and  about  the  truth  of  which,  as  applied  to 
this  subject,  we  possess  no  external  relevant  evidence.     It  has 
been  held  that  there  are  here  two   exhaustive  and  exclusive 
alternatives — the  truth  of '  the  proposition  and  the  truth  of  its 
contradictory — while  our  knowledge  of  the  subject  affords  no 
ground  for  preferring  one  to  the  other.     Thus  if  a  and  a  are 
contradictories,  about  the  subject  of  which  we  have  no  outside 
knowledge,  it  is  inferred  that  the  probability  of  each  is  J.2     In 

1  Published  in  1886.     A  brief  account  of  Von  Kries's  principal  conclusions 
will  be  given  on  p.  87.     A  useful  summary  of  his  book  will  be  found  in  a  review 
by  Meinong,  published  in  the  Gottingische  gelehrte  Anzeigen  for  1890  (pp.  56-75). 

2  Cf.  (e.g.)  the  well-known  passage  in  Jevons's  Principles  of  Science,  vol.  i. 
p.  243,  in  which  he  assigns  the  probability  &  to  the  proposition  "  A  Platythliptic 
Coefficient  is  positive."     Jevons  points  out,  by  way  of  proof,  that  no  other 
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the  same  way  the  probabilities  of  two  other  propositions,  6  and  c, 
having  the  same  subject  as  a,  may  be  each  -£.  But  without 
having  any  evidence  bearing  on  the  subject  of  these  propositions 
we  may  know  that  the  predicates  are  contraries  amongst  them 
selves,  and,  therefore,  exclusive  alternatives — a  supposition  which 
leads  by  means  of  the  same  principle  to  values  inconsistent  with 
those  just  obtained.  If,  for  instance,  having  no  evidence  relevant 
to  the  colour  of  this  book,  we  could  conclude  that  £  is  the  proba 
bility  of  '  This  book  is  red,'  we  could  conclude  equally  that  the 
probability  of  each  of  the  propositions  '  This  book  is  black  '  and 
'  This  book  is  blue  '  is  also  |.  So  that  we  are  faced  with  the 
impossible  case  of  three  exclusive  alternatives  all  as  likely  as  not. 
A  defender  of  the  Principle  of  Indifference  might  rejoin  that  we 
are  assuming  knowledge  of  the  proposition  :  '  Two  different 
colours  cannot  be  predicated  of  the  same  subject  at  the  same 
time  '  ;  and  that,  if  we  know  this,  it  constitutes  relevant  out 
side  evidence.  But  such  evidence  is  about  the  predicate,  not 
about  the  subject.  Thus  the  defender  of  the  Principle  will  be 
driven  on,  either  to  confine  it  to  cases  where  we  know  nothing 
about  either  the  subject  or  the  predicate,  which  would  be  to 
emasculate  it  for  all  practical  purposes,  or  else  to  revise  and 
amplify  it,  which  is  what  we  propose  to  do  ourselves. 

The  difficulty  cannot  be  met  by  saying  that  we  must  know 
and  take  account  of  the  number  of  possible  contraries.  For  the 
number  of  contraries  to  any  proposition  on  any  evidence  is  always 
infinite  ;  alt  is  contrary  to  a  for  all  values  of  6.  The  same  point 
can  be  put  in  a  form  which  does  not  involve  contraries  or 
contradictories.  For  example,  a/k  =  %  and  a6/A  =  -J,  if  k  is 

probability  could  reasonably  be  given.  This,  of  course,  involves  the  assumption 
that  every  proposition  must  have  some  numerical  probability.  Such  a  con 
tention  was  h'rst  criticised,  so  far  as  i  am  aware,  by  Bishop  Terrot  in  the  Ed  in. 
Phil.  Trans,  for  1856.  It  was  deliberately  rejected  by  Boole  in  his  last  pub 
lished  work  on  probability  :  "  It  is  a  plain  consequence,"  he  says  (Edin.  1'liil. 
Trans,  vol.  xxi.  p.  024),  "  of  the  logical  theory  of  probabilities,  that  the  .state 
of  expectation  which  accompanies  entire  ignorance  of  an  event  is  properly 
represented,  not  by  the  fraction  £,  but  by  the  indefinite  form  •;."  Jevons's 
particular  example,  however,  is  also  open  to  the  objection  that  we  do  not  even 
know  the  ineaniny  of  the  subject  of  the  proposition.  Would  ho  maintain  that 
there  is  any  sense  in  saying  that  for  those  who  know  no  Arabic  the  probability 
of  every  statement  expressed  in  Arabic  is  even  '/  How  fur  has  he  been 
influenced  in  the  choice  of  his  example  by  known  characteristics  of  the  predicate 
'  positive  '  ?  Would  he  have  assigned  the  probability  ^  to  the  proposition 
'  A  I'latythliptic  Coefficient  is  a  perfect  cube  '  ?  What  about  the  proposition 
'  A  i'latythliptic  Coefficient  is  allogeneoua  '  ? 
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irrelevant  both  to  a  and  to  b,  in  the  sense  required  by  the  crude 
Principle  of  Indifference.1  It  follows  from  this  that,  if  a  is  true, 
6  must  be  true  also.  If  it  follows  from  the  absence  of  positive 
data  that  '  A  is  a  red  book '  has  a  probability  of  J,  and  that  the 
probability  of  '  A  is  red  '  is  also  J,  then  we  may  deduce  that,  if 
A  is  red,  it  must  certainly  be  a  book. 

We  may  take  it,  then,  that  the  probability  of  a  proposition, 
about  the  subject  of  which  we  have  no  extraneous  evidence,  is 
not  necessarily  |~.  Whether  or  not  this  conclusion  discredits  the 
Principle  of  Indifference,  it  is  important  on  its  own  account,  and 
will  help  later  on  to  confute  some  famous  conclusions  of  Laplace's 
school. 

5.  Objection  can  now  be  made  in  a  somewhat  different  shape. 
Let  us  suppose  as  before  that  there  is  no  positive  evidence  relating 
to  the  subjects  of  the  propositions  under  examination  which 
would  lead  us  to  discriminate  in  any  way  between  certain 
alternative  predicates.  If,  to  take  an  example,  we  have  no 
information  whatever  as  to  the  area  or  population  of  the 
countries  of  the  world,  a  man  is  as  likely  to  be  an  inhabitant 
of  Great  Britain  as  of  France,  there  being  no  reason  to  prefer 
one  alternative  to  the  other.2  He  is  also  as  likely  to  be  an 
inhabitant  of  Ireland  as  of  France.  And  on  the  same  principle 
he  is  as  likely  to  be  an  inhabitant  of  the  British  Isles  as  of 
France.  And  yet  these  conclusions  are  plainly  inconsistent. 
For  our  first  two  propositions  together  yield  the  conclusion 
that  he  is  twice  as  likely  to  be  an  inhabitant  of  the  British 
Isles  as  of  France. 

Unless  we  argue,  as  I  do  not  think  we  can,  that  the  knowledge 
that  the  British  Isles  are  composed  of  Great  Britain  and  Ireland 
is  a  ground  for  supposing  that  a  man  is  more  likely  to  inhabit 
them  than  France,  there  is  no  way  out  of  the  contradiction.  It 
is  not  plausible  to  maintain,  when  we  are  considering  the  relative 
populations  of  different  areas,  that  the  number  of  names  of  sub 
divisions  which  are  within  our  knowledge,  is,  in  the  absence  of 
any  evidence  as  to  their  size,  a  piece  of  relevant  evidence. 

At  any  rate,  many  other  similar  examples  could  be  invented, 

1  a/A  stands  for  '  the  probability  of  a  on  hypothesis  A.' 

2  This  example  raises  a  difficulty  similar  to  that  raised  by  Von  Kries's 
example  of  the  meteor.     Stumpf  has  propounded  an  invalid  solution  of  Von 
Kries's  difficulty.     Against  the  example  proposed  here,  Stumpf's  solution  has 
less  plausibility  than  against  Von  Kries's. 
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which  would  require  a  special  explanation  in  each  case  ;  for  the 
above  is  an  instance  of  a  perfectly  general  difficulty.  The 
possible  alternatives  may  be  a,  b,  c,  and  d,  and  there  may  be  no 
means  of  discriminating  between  them  ;  but  equally  there  may 
be  no  means  of  discriminating  between  (a  or  6),  c,  and  d. 
This  difficulty  could  be  made  striking  in  a  variety  of  ways,  but 
it  will  be  better  to  criticise  the  principle  further  from  a  some 
what  different  side. 

6.  Consider  the  specific  volume  of  a  given  substance.1  Let  us 
suppose  that  we  know  the  specific  volume  to  lie  between  1  and  3, 
but  that  we  have  no  information  as  to  whereabouts  in  this  interval 
its  exact  value  is  to  be  found.  The  Principle  of  Indifference 
would  allow  us  to  assume  that  it  is  as  likely  to  lie  between  1  and 
2  as  between  2  and  3  ;  for  there  is  no  reason  for  supposing  that  it 
lies  in  one  interval  rather  than  in  the  other.  But  now  consider 
the  specific  density.  The  specific  density  is  the  reciprocal  of 
the  specific  volume,  so  that  if  the  latter  is  v  the  former  is  l 
Our  data  remaining  as  before,  we  know  that  the  specific  density 
must  lie  between  1  and  J,  and,  by  the  same  use  of  the  Principle 
of  Indifference  as  before,  that  it  is  as  likely  to  lie  between 
1  and  §  as  between  §  and  J.  But  the  specific  volume  being 
a  determinate  function  of  the  specific  density,  if  the  latter  lies 
between  1  and  f,  the  former  lies  between  1  and  H,  and  if  the 
latter  lies  between  §  and  J,  the  former  lies  between  U  and  3. 
It  follows,  therefore,  that  the  specific  volume  is  as  likely  to  he 
between  1  and  1 J  as  between  1 1  and  3  ;  whereas  we  have  already 
proved,  relatively  to  precisely  the  same  data,  that  it  is  as  likely 
to  lie  between  1  and  2  as  between  2  and  3.  Moreover,  any  other 
function  of  the  specific  volume  would  have  suited  our  purpose 
equally  well,  and  by  a  suitable  choice  of  this  function  we  might 
have  proved  in  a  similar  manner  that  any  division  whatever 
of  the  interval  1  to  3  yields  sub-intervals  of  equal  probability. 
Specific  volume  and  specific  density  are  simply  alternative 
methods  of  measuring  the  same  objective  quantity  ;  and  there 
are  many  methods  which  might  be  adopted,  each  yielding  on  the 
application  of  the  Principle  of  Indifference  a  different  probability 
for  a  given  objective  variation  in  the  quantity.2 

1  This  example  is  taken  from  Yon  Kries,  op.  cit.  p.  24.  Von  Krics  docs 
not  seem  to  me  to  explain  correctly  how  the  contradiction  arises. 

1  A.  N'itschc  ("  Die  Dimensionen  der  Wahncheinlichkeit  und  die  Evidenv.  der 
L'ugewisaheit,"  Vierttljahrsschr.  f.  wissen-sch.  Philos.  vol.  xvi.  p.  LMJ,  ibDi.'),  in 
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The  arbitrary  nature  of  particular  methods  of  measurement 
of  this  and  of  many  other  physical  quantities  is  easily  explained. 
The  objective  quality  measured  may  not,  strictly  speaking,  possess 
numerical  quantitativeness,  although  it  has  the  properties  neces 
sary  for  measurement  by  means  of  correlation  with  numbers. 
The  values  which  it  can  assume  may  be  capable  of  being 
ranged  in  an  order,  and  it  will  sometimes  happen  that  the  series 
which  is  thus  formed  is  continuous,  so  that  a  value  can  always 
be  found  whose  order  in  the  series  is  between  any  two  selected 
values  ;  but  it  does  not  follow  from  this  that  there  is  any  meaning 
in  the  assertion  that  one  value  is  twice  another  value.  The 
relations  of  continuous  order  can  exist  between  the  terms  of  a 
series  of  values,  without  the  relations  of  numerical  quantitative- 
ness  necessarily  existing  also,  and  in  such  cases  we  can  adopt  a 
largely  arbitrary  measure  of  the  successive  terms,  which  yields 
results  which  may  be  satisfactory  for  many  purposes,  those, 
for  instance,  of  mathematical  physics,  though  not  for  those  of 
probability.  This  method  is  to  select  some  other  series  of 
quantities  or  numbers,  each  of  the  terms  of  which  corresponds 
in  order  to  one  and  only  one  of  the  terms  of  the  series  which 
we  wish  to  measure,  For  instance,  the  series  of  character 
istics,  differing  in  degree,  which  are  measured  by  specific 
volume,  have  this  relation  to  the  series  of  numerical  ratios 
between  the  volumes  of  equal  masses  of  the  substances,  the 
specific  volumes  of  which  are  in  question,  and  of  water.  They 
have  it  also  to  the  corresponding  ratios  which  give  rise  to  the 
measure  of  specific  density.  But  these  only  yield  conventional 
measurements,  and  the  numbers  with  which  we  correlate  the 

criticising  Von  Kries,  argues  that  the  alternatives  to  which  the  principle  must 
be  applied  are  the  smallest  physically  distinguishable  intervals,  and  that  the 
probability  of  the  specific  volume's  lying  within  a  certain  range  of  values  turns 
on  the  number  of  such  distinguishable  intervals  in  the  range.  This  procedure 
might  conceivably  provide  the  correct  method  of  computation,  but  it  does  not 
therefore  restore  the  credit  of  the  Principle  of  Indifference.  For  it  is  argued, 
not  that  the  results  of  applying  the  principle  are  always  wrong,  but  that  it  does 
not  lead  unambiguously  to  the  correct  procedure.  If  we  do  not  know  the 
number  of  distinguishable  intervals  we  have  no  reason  for  supposing  that  the 
specific  volume  lies  between  1  and  2  rather  than  2  and  3,  and  the  principle  can 
therefore  be  applied  as  it  has  been  applied  above.  And  even  if  we  do  know 
the  number  and  reckon  intervals  as  equal  which  contain  an  equal  number  of 
4  physically  distinguishable  '  parts,  is  it  certain  that  this  does  not  simply 
provide  us  with  a  new  system  of  measurement,  which  has  the  same  conven 
tional  basis  as  the  methods  of  specific  volume  and  specific  density,  and  is  no 
more  the  one  correct  measure  than  these  are  ? 
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terms  which  we  wish  to  measure  can  be  selected  in  a  variety  of 
ways.  It  follows  that  equal  intervals  between  the  numbers 
which  represent  the  ratios  do  not  necessarily  correspond  to  equal 
intervals  between  the  qualities  under  measurement ;  for  these 
numerical  differences  depend  upon  which  convention  of  measure 
ment  we  have  selected. 

7.  A  somewhat  analogous  difficulty  arises  in  connection  with 
the  problems  of  what  is  known  as  '  geometrical  '  or  "  local  ' 
probability.1  In  these  problems  we  are  concerned  with  the  posi 
tion  of  a  point  or  infinitesimal  area  or  volume  within  a  con 
tinuum.2  The  number  of  cases  here  is  indefinite,  but  the  Principle 
of  Indifference  has  been  held  to  justify  the  supposition  that  equal 
lengths  or  areas  or  volumes  of  the  continuum  are,  in  the  absence 
of  discriminating  evidence,  equally  likely  to  contain  the  point. 
It  has  long  been  known  that  this  assumption  leads  in  numerous 
cases  to  contradictory  conclusions.  If.  for  instance,  two  points 
A  and  A'  are  taken  at  random  on  the  surface  of  a  sphere,  and  we 
seek  the  probability  that  the  lesser  of  the  twro  arcs  of  the  great 
circle  AA'  is  less  than  a,  we  get  one  result  by  assuming  that  the 
probability  of  a  point's  lying  on  a  given  portion  of  the  sphere's 
surface  is  proportional  to  the  area  of  that  portion,  and  another 
result  by  assuming  that,  if  a  point  lies  on  a  given  great  circle,  the 
probability  of  its  lying  on  a  given  arc  of  that  circle  is  proportional 
to  the  length  of  the  arc,  each  of  these  assumptions  being  equally 
justified  by  the  Principle  of  Indifference. 

Or  consider  the  following  problem  :  if  a  chord  in  a  circle  is 
drawn  at  random,  what  is  the  probability  that  it  will  be  less 
than  the  side  of  the  inscribed  equilateral  triangle.  One  can 
argue  :— 

(a)  It  is  indifferent  at  what  point  one  end  of  the  chord  lies. 
Jf  we  suppose  this  end  fixed,  the  direction  is  then 

1  The  best  accounts  of  this  subject  are  to  be  found  in  Czuber,  Geometrische 
Wahrschcinlichkeiten  und  Mitte Iwcrte ;      Czubcr,     Wahrscheinlichkeitxrccknumj, 
vol.   i.   pp.   75-109;   Crofton,   Encyd.   Brit.  (9th  edit.),  article   'Probability'; 
Borel,    fclemenU  de   la  thcorie  des  probability,  chapa.    vi.-viii.  ;    a   few   other 
references  are  given  in  the  following  pages,  and  a  number  of  discussions  of 
individual     problems    will    be    found    in    the    mathematical     volumes    of    the 
Educational    Time*.      The  interest  of  the  subject  is  primarily  mathematical, 
and  no  discussion  of  its  principal  problems  will  be  attempted  here. 

2  As  Czubcr   points   out  ( Wahrscheinlichkeitsrecknung,   vol.    i.   p.   84),   all 
problems,  whether  geometrical  or  arithmetical,  which  deal  with  a  continuum 
and  with  non-enumerable  aggregates,  are  commonly  discussed  under  the  name  of 
'geometrical  probability.'     Sec  also  Larmnel,  Untcrsuchungcn. 
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chosen  at  random.  In  this  case  the  answer  is  easily 
shown  to  be  f . 

(6)  It  is  indifferent  in  what  direction  we  suppose  the  chord 
to  lie.  Beginning  with  this  apparently  not  less  justifi 
able  assumption,  we  find  that  the  answer  is  |. 

(c)  To  choose  a  chord  at  random,  one  must  choose  its 
middle  point  at  random.  If  the  chord  is  to  be  less 
than  the  side  of  the  inscribed  equilateral  triangle,  the 
middle  point  must  be  at  a  greater  distance  from  the 
centre  than  half  the  radius.  But  the  area  at  a 
greater  distance  than  this  is  £  of  the  whole.  Hence 


our  answer  is  £. 


In  general,  if  x  and  f(x)  are  both  continuous  variables,  varying 
always  in  the  same  or  in  the  opposite  sense,  and  x  must  lie 
between  a  and  b,  then  the  probability  that  x  lies  between  c 

d-c 
and  d,  where  a<c<d<b,  seems  to  be  '  and  the  probability 

that  f(x)   lies  between  /(c)   and  f(d)   to  be  These 


expressions,  which  represent  the  probabilities  of  necessarily 
concordant  conclusions,  are  not,  as  they  ought  to  be,  equal.2 

8.  More  than  one  attempt  has  been  made  to  separate  the 
cases  in  which  the  Principle  of  Indifference  can  be  legitimately 
applied  to  examples  of  geometrical  probability  from  those  in 
which  it  cannot.  M.  Borel  argues  that  the  mathematician  can 
define  the  geometrical  probability  that  a  point  M  lies  on  a  certain 
segment  PQ  of  AD  as  proportional  to  the  length  of  the  segment, 
but  that  this  definition  is  conventional  until  its  consequences 
have  been  confirmed  d  posteriori  by  their  conformity  with  the 
results  of  empirical  observation.  He  points  out  that  in  actual 
cases  there  are  generally  some  considerations  present  which 
lead  us  to  prefer  one  of  the  possible  assumptions  to  the  others. 
Whether  or  not  this  is  so,  the  proposed  procedure  amounts  to 
an  abandonment  of  the  Principle  of  Indifference  as  a  valid 
criterion,  and  leaves  our  choice  undetermined  when  further 
evidence  is  not  forthcoming. 

M.  Poincare,  who  also  held  that  judgments  of  equiprobability 
in  such  cases  depend  upon  a  '  convention,'  endeavoured  to  mini- 

1  Bertrand,  Calcul  des  probability,  p.  5. 
2  See  (e.g.)  Borel,  Elements  de  la  theorie  des  probability,  p.  85. 
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mise  the  importance  of  the  arbitrary  element  by  showing  that, 
under  certain  conditions,  the  result  is  independent  of  the  particu 
lar  convention  which  is  chosen.  Instead  of  assuming  that  the 
point  is  equally  likely  to  lie  in  every  infinitesimal  interval  dx 
we  may  represent  the  probability  of  its  lying  in  this  interval  by 
the  function  <f>(x)dx.  M.  Poincare  showed  that,  in  the  game  of 
rouge  et  noir,  for  instance,  where  we  have  a  number  of  compart 
ments  arranged  in  a  circle  coloured  alternately  black  and  white, 
if  we  can  assume  that  <f)(x)  is  a  regular  function,  continuous  and 
with  continuous  differential  coefficients,  then,  whatever  the 
particular  form  of  the  function,  the  probability  of  black  is 
approximately  equal  to  that  of  white.1 

Whether  or  not  investigations  on  these  lines  prove  to  have 
a  practical  value,  they  have  not,  I  think,  any  theoretical  import 
ance.  If,  as  I  maintain,  the  probability  </>(#)  is  not  necessarily 
numerical,  it  is  not  a  generally  justifiable  assumption  to 
take  its  continuity  for  granted.  We  have,  in  the  particular 
example  quoted,  a  number  of  alternatives,  half  of  which  lead  to 
black  and  half  to  white  ;  the  assumption  of  continuity  amounts 
to  the  assumption  that  for  every  white  alternative  there  is  a 
black  alternative  whose  probability  is  very  nearly  equal  to  that 
of  the  white.  Naturally  in  such  a  case  we  can  get  an  approxi 
mately  equal  probability  for  the  whites  as  a  whole  and  for  the 
blacks  as  a  whole,  without  assuming  equal  probability  for  each 
alternative  individually.  But  this  fact  has  no  bearing  on  the 
theoretical  difficulties  which  we  are  discussing. 

M.  Bertrand  is  so  much  impressed  by  the  contradictions  of 
geometrical  probability  that  he  wishes  to  exclude  all  examples 
in  which  the  number  of  alternatives  is  infinite.'2  It  will  be  argued 
in  the  sequel  that  something  resembling  this  is  true.  The  dis 
cussion  of  this  question  will  be  resumed  in  §§  21-25. 

9.  There  is  yet  another  group  of  cases,  distinct  in  character 
from  those  considered  so  far,  in  which  the  principle  does  not 
seem  to  provide  us  with  unambiguous  guidance.  The  typical 
example  is  that  of  an  urn  containing  black  and  white  balls  in  an 

1  Poincare,  Calcul  des  probability,  pp.  120  et  aeq. 

2  Bertrund,  Calcul  dea  probability,  p.  4:    "  L'infini  n'cat  pas  un   nombro  ; 
on    no   doit    pas,    sans   explication,    Tintroduiro   dans   lea   raisonnements.     La 
precision  illusoiru  dea  mots  pourrait  faire  naitre  des  contradictions.     Choisir 
au  hasard,  cntro  un  nombro  inliui  do  cas  possibles,  n'est  pas  uno  indication 
aulliaunte.' 

E 
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unknown  proportion.1  The  Principle  of  Indifference  can  be 
claimed  to  support  the  most  usual  hypothesis,  namely,  that  all 
possible  numerical  ratios  of  black  and  white  are  equally  probable. 
But  we  might  equally  well  assume  that  all  possible  constitutions  2 
of  the  system  of  balls  are  equally  probable,  so  that  each  individual 
ball  is  assumed  equally  likely  to  be  black  or  white.  It  would 
follow  from  this  that  an  approximately  equal  number  of  black 
and  white  balls  is  more  probable  than  a  large  excess  of  one  colour. 
On  this  hypothesis,  moreover,  the  drawing  of  one  ball  and  the 
resulting  knowledge  of  its  colour  leaves  unaltered  the  proba 
bilities  of  the  various  possible  constitutions  of  the  rest  of  the  bag  ; 
whereas  on  the  first  hypothesis  knowledge  of  the  colour  of  one 
ball,  drawn  and  not  replaced,  manifestly  alters  the  probability 
of  the  colour  of  the  next  ball  to  be  drawn.  Either  of  these  hypo 
theses  seems  to  satisfy  the  Principle  of  Indifference,  and  a  believer 
in  the  absolute  validity  of  the  principle  will  doubtless  adopt  that 
one  which  enters  his  mind  first.3 

The  same  point  is  very  clearly  illustrated  by  an  example 
which  I  take  from  Von  Kries.  Two  cards,  chosen  from  different 
packs,  are  placed  face  downwards  on  the  table  ;  one  is  taken 
up  and  found  to  be  of  a  black  suit  :  what  is  the  chance  that  the 
other  is  black  also  ?  One  would  naturally  reply  that  the 
chance  is  even.  But  this  is  based  on  the  supposition,  relatively 
unpopular  with  writers  on  the  subject,  that  every  '  constitution  ' 
is  equally  probable,  i.e.  that  each  individual  card  is  as  likely 
to  be  black  as  red.  If  we  prefer  this  assumption,  we  must  relin- 

1  The  difficulty  in  question  was  first  pointed  out  by  Boole,  Laws  of  Thought, 
pp.  369-370.     After  discussing  the  Law  of  Succession,  Boole  proceeds  to  show 
that   "there  are  other  hypotheses,  as  strictly  involving  the  principle  of    the 
'  equal  distribution  of  knowledge  or  ignorance  '  which  would  also  conduct  to 
conflicting  results."     See  also  Von  Kries,  op.  cit.  pp.  31-34,  59,  and  Sturnpf, 
Uber  den  Begrifj  der  matliematischen  Wahrscheinlichkeit,  Bavarian  Academy, 
1892,  pp.  64-68. 

2  if  A  and  B  are  two  balls,  A  white,  B  black,  and  A  black,  B  white,  are 
different  '  constitutions.'     But  if  we  consider  different  numerical  ratios,  these 
two  cases  are  indistinguishable,  and  count  as  one  only. 

3  C.  S.  Peirce  in  his  Theory  of  Probable  Inference  (Johns  Hopkins  Studies  in 
Logic),  pp.  172,  173,  argues  that  the  '  constitution  '  hypothesis  is  alone  valid, 
on  the  ground  that,  of  the  two  hypotheses,  only  this  one  is  consistent  with  itself. 
I  agree  with  his  conclusion,  and  shall  give  at  the  close  of  the  chapter  the  funda 
mental  considerations  which  lead  to  the  rejection  of  the  '  ratio  '  hypothesis. 
Stumpi  points  out  that  the  probability  of  drawing  a  white  ball  is,  in  any 
case,  \.     This  is  true  ;    but  the  probability  of  a  second  white  clearly  depends 
upon  which  of  the  two  hypotheses  has  been  preferred.     Nitsche  (loc.  cil.  p.  31) 
seems  to  miss  the  point  of  the  difficulty  in  the  same  way. 
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quish  the  text-book  theory  that  the  drawing  of  a  black  ball  from 
an  urn,  containing  black  and  white  balls  in  unknown  proportions, 
affects  our  knowledge  as  to  the  proportion  of  black  and  white 
amongst  the  remaining  balls. 

The  alternative — or  text-book — theory  assumes  that  there 
are  three  equal  possibilities — one  of  each  colour,  both  black,  both 
red.  If  both  cards  are  black,  we  are  twice  as  likely  to  turn  up 
a  black  card  than  if  only  one  is  black.  After  we  have  turned  up 
a  black,  the  probability  that  the  other  is  black  is,  therefore,  twice 
as  great  as  the  probability  that  it  is  red.  The  chance  of  the 
second's  being  black  is  therefore  f.1  The  Principle  of  Indifference 
has  nothing  to  say  against  either  solution.  Until  some  further 
criterion  has  been  proposed  we  seem  compelled  to  agree  with 
Poincare  that  a  preference  for  either  hypothesis  is  wholly  arbitrary. 

10.  Such,  then,  are  the  kinds  of  result  to  which  an  unguarded 
use  of  the  Principle  of  Indifference  may  lead  us.  The  difficulties, 
to  which  attention  has  been  drawn,  have  been  noticed  before  ; 
but  the  discredit  has  not  been  emphatically  thrown  on  the 
original  source  of  error.  Yet  the  principle  certainly  remains  as 
a  negative  criterion  ;  two  propositions  cannot  be  equally  probable, 
so  long  as  there  is  any  ground  for  discriminating  between  them. 
The  principle  is  a  necessary,  but  not,  as  it  seems,  a  sufficient 
condition. 

The  enunciation  of  some  sufficient  rule  is  certainly  essential  if 
we  are  to  make  any  progress  in  the  subject.  But  the  difficulty 
of  discovering  a  correct  principle  is  considerable.  This  difficulty 
is  partly  responsible,  I  think,  for  the  doubts  which  philosophers 
and  many  others  have  often  felt  regarding  any  practical  applica 
tion  of  the  Calculus.  Many  candid  persons,  when  confronted 
with  the  results  of  Probability,  feel  a  strong  sense  of  the  un 
certainty  of  the  logical  basis  upon  which  it  seems  to  rest.  It  is 
difficult  to  find  an  intelligible  account  of  the  meaning  of  *  proba 
bility,'  or  of  how  we  are  ever  to  determine  the  probability  of  any 
particular  proposition  ;  and  yet  treatises  on  the  subject  profess 
to  arrive  at  complicated  results  of  the  greatest  precision  and  the 
most  profound  practical  importance. 

The  incautious  methods  and  exaggerated  claims  of  the  school 
of  Laplace  have  undoubtedly  contributed  towards  the  existence 
of  these  sentiments.  But  the  general  scepticism,  which  I  believe 
1  This  is  Poisson'a  solution,  liecherches,  p.  90. 
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to  be  much  more  widely  spread  than  the  literature  of  the  subject 
admits,  is  more  fundamental.  In  this  matter  Hume  need  not 
have  felt  "  affrighted  and  confounded  with  that  forelorn  solitude, 
in  which  I  am  placed  in  my  philosophy,"  or  have  fancied  himself 
"  some  strange  uncouth  monster,  who  not  being  able  to  mingle 
and  unite  in  society,  has  been  expell'd  all  human  commerce, 
and  left  utterly  abandon'd  and  disconsolate."  In  his  views  on 
probability,  he  stands  for  the  plain  man  against  the  sophisms 
and  ingenuities  of  "  metaphysicians,  logicians,  mathematicians, 
and  even  theologians." 

Yet  such  scepticism  goes  too  far.  The  judgments  of  proba 
bility,  upon  which  we  depend  for  almost  all  our  beliefs  in  matters 
of  experience,  undoubtedly  depend  on  a  strong  psychological 
propensity  in  us  to  consider  objects  in  a  particular  light.  But 
this  is  no  ground  for  supposing  that  they  are  nothing  more  than 
"  lively  imaginations."  The  same  is  true  of  the  judgments  in 
virtue  of  which  we  assent  to  other  logical  arguments  ;  and  yet 
in  such  cases  we  believe  that  there  may  be  present  some  element 
of  objective  validity,  transcending  the  psychological  impulsion, 
with  which  primarily  we  are  presented.  So  also  in  the  case  of 
probability,  we  may  believe  that  our  judgments  can  penetrate 
into  the  real  world,  even  though  their  credentials  are  subjective. 

11.  We  must  now  inquire  how  far  it  is  possible  to  rehabilitate 
the  Principle  of  Indifference  or  find  a  substitute  for  it.     There 
are  several  distinct  difficulties  which  need  attention  in  a  dis 
cussion   of  the  problems  raised   in   the   preceding  paragraphs. 
Our  first  object  must  be  to  make  the  Principle  itself  more  precise 
by  disclosing  how  far  its  application  is  mechanical  and  how  far 
it  involves  an  appeal  to  logical  intuition. 

12.  Without  compromising  the  objective  character  of  relations 
of  probability,  we  must  nevertheless  admit  that  there  is  little 
likelihood  of  our  discovering  a  method  of  recognising  particular 
probabilities,  without  any  assistance  whatever  from  intuition  or 
direct  judgment.     Inasmuch  as  it  is  always  assumed  that  we  can 
sometimes  judge  directly  that  a  conclusion  follows  from  a  premiss, 
it  is  no  great  extension  of  this  assumption  to  suppose  that  we 
can  sometimes  recognise  that  a  conclusion  partially  follows  from, 
or  stands  in  a  relation  of  probability  to,  a  premiss.     Moreover, 
the  failure  to  explain  or  define  '  probability  '  in  terms  of  other 
logical  notions,  creates  a  presumption  that  particular  relations 
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of  probability  must  be,  in  the  first  instance,  directly  recognised 
as  such,  and  cannot  be  evolved  by  rule  out  of  data  which  them 
selves  contain  no  statements  of  probability. 

On  the  other  hand,  although  we  cannot  exclude  every  element 
of  direct  judgment,  these  judgments  may  be  limited  and  con 
trolled,  perhaps,  by  logical  rules  and  principles  which  possess  a 
general  application.  While  we  may  possess  a  faculty  of  direct 
recognition  of  many  relations  of  probability,  as  in  the  case  of 
many  other  logical  relations,  yet  some  may  be  much  more 
easily  recognisable  than  others.  The  object  of  a  logical  system 
of  probability  is  to  enable  us  to  know  the  relations,  which 
cannot  be  easily  perceived,  by  means  of  other  relations  which 
we  can  recognise  more  distinctly — to  convert,  in  fact,  vague 
knowledge  into  more  distinct  knowledge.1 

13.  Let  us  seek  to  distinguish  between  the  element  of  direct 
judgment  and  the  element  of  mechanical  rule  in  the  Principle 
of  Indifference.  The  enunciation  of  this  principle,  as  it  is 
ordinarily  expressed,  cloaks,  but  does  not  avoid,  the  former 
element.  It  is  in  part  a  formula  and  in  part  an  appeal  to  direct 
inspection  :  but  in  addition  to  the  obscurity  and  ambiguity  of 
the  formula,  the  appeal  to  intuition  is  not  as  explicit  as  it  should 
be.  The  principle  states  that  '  there  must  be  no  known 
reason  for  preferring  one  of  a  set  of  alternatives  to  any  other.' 
What  does  this  mean  ?  What  are  '  reasons,'  and  how  are 
we  to  know  whether  they  do  or  do  not  justify  us  in  preferring 
one  alternative  to  another  ?  I  do  not  know  any  discussion 
of  Probability  in  which  this  question  has  been  so  much  as 
asked.  If,  for  example,  we  are  considering  the  probability 
of  drawing  a  black  ball  from  an  urn  containing  balls  which  are 

1  As  it  is  the  aim  of  trigonometry  to  determine  the  position  of  an  object, 
which  is  in  a  sense  visible,  not  by  a  direct  observation  of  it,  but  by  observing 
some  other  object  together  with  certain  relations,  so  an  indirect  method  of  this 
kind  is  the  aim  of  all  logical  system.  If  the  truth  of  some  propositions,  and  the 
validity  of  some  arguments,  could  not  be  recognised  directly,  we  could  make  no 
progress.  We  may  have,  moreover,  some  power  of  direct  recognition  where  it 
is  not  necessary  in  our  logical  system  that  we  should  make  use  of  it.  In  these 
cases  the  method  of  logical  proof  increases  the  certainty  of  knowledge,  which 
wo  might  be  able  to  possess  in  a  more  doubtful  manner  without  it.  In  other 
cases,  that,  for  instance,  of  a  complicated  mathematical  theorem,  it  enables 
us  to  know  propositions  to  bo  true,  which  are  altogether  beyond  the  reach  of 
our  direct  insight ;  just  as  we  can  often  obtain  knowledge  about  the  position 
of  a  partially  visible  or  even  invisible  object  by  starting  with  observations  of 
other  objects. 
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black  and  white,  we  assume  that  the  difference  of  colour  be 
tween  the  balls  is  not  a  reason  for  preferring  either  alternative. 
But  how  do  we  know  this,  unless  by  a  judgment  that,  on  the 
evidence  in  hand,  our  knowledge  of  the  colours  of  the  balls  is 
irrelevant  to  the  probability  in  question  ?  We  know  of  some 
respects  in  which  the  alternatives  differ  ;  but  we  judge  that  a 
knowledge  of  these  differences  is  not  relevant.  If,  on  the  other 
hand,  we  were  taking  the  balls  out  of  the  urn  with  a  magnet, 
and  knew  that  the  black  balls  were  of  iron  and  the  white  of  tin, 
we  might  regard  the  fact,  that  a  ball  was  iron  and  not  tin,  as 
very  important  in  determining  the  probability  of  its  being 
drawn.  Before,  then,  we  can  begin  to  apply  the  Principle  of 
Indifference,  we  must  have  made  a  number  of  direct  judgments 
to  the  effect  that  the  probabilities  under  consideration  are  un 
affected  by  the  inclusion  in  the  evidence  of  certain  particular 
details.  We  have  no  right  to  say  of  any  known  difference 
between  the  two  alternatives  that  it  is  '  no  reason  '  for  preferring 
one  of  them,  unless  we  have  judged  that  a  knowledge  of  this 
difference  is  irrelevant  to  the  probability  in  question. 

14.  A  brief  digression  is  now  necessary,  in  order  to  introduce 
some  new  terms.  There  are  in  general  two  principal  types  of 
probabilities,  the  magnitudes  of  which  we  seek  to  compare, — 
those  in  which  the  evidence  is  the  same  and  the  conclusions 
different,  and  those  in  which  the  evidence  is  different  but  the 
conclusion  the  same.  Other  types  of  comparison  may  be  re 
quired,  but  these  two  are  by  far  the  commonest.  In  the  first 
we  compare  the  likelihood  of  two  conclusions  on  given  evidence  ; 
in  the  second  we  consider  what  difference  a  change  of  evidence 
makes  to  the  likelihood  of  a  given  conclusion.  In  symbolic 
language  we  may  wish  to  compare  x/h  with  y/h,  or  x/h  with 
x/h^h.  We  may  call  the  first  type  judgments  of  preference,  or, 
when  there  is  equality  between  x/h  and  y/h,  of  indifference  ;  and 
the  second  type  we  may  call  judgments  of  relevance,  or,  when  there 
is  equality  between  x/h  and  x/hji,  of  irrelevance.  In  the  first 
we  consider  whether  or  not  x  is  to  be  preferred  to  y  on  evidence  h  ; 
in  the  second  we  consider  whether  the  addition  of  7^  to  evidence 
h  is  relevant  to  x. 

The  Principle  of  Indifference  endeavours  to  formulate  a  rule 
which  will  justify  judgments  of  indifference.  But  the  rule  that 
there  must  be  no  ground  for  preferring  one  alternative  to  another, 
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involves,  if  it  is  to  be  a  guiding  rule  at  all,  and  not  a  jxtitio 
'princijm,  an  appeal  to  judgments  of  irrelevance. 

The  simplest  definition  of  Irrelevance  is  as  follows :  hl  is 
irrelevant  to  x  on  evidence  h,  if  the  probability  of  x  on  evidence  hh^ 
is  the  same  as  its  probability  on  evidence  h.1  But  for  a  reason 
which  will  appear  in  Chapter  VI.,  a  stricter  and  more  complicated 
definition,  as  follows,  is  theoretically  preferable  :  /^  is  irrelevant 
to  x  on  evidence  It,  if  there  is  no  proposition,  inferrible  from  hji 
but  not  from  h,  such  that  its  addition  to  evidence  h  affects  the 
probability  of  x.2  Any  proposition  which  is  irrelevant  in  the 
strict  sense  is,  of  course,  also  irrelevant  in  the  simpler  sense  ; 
but  if  we  were  to  adopt  the  simpler  definition,  it  would  sometimes 
occur  that  a  part  of  evidence  would  be  relevant,  which  taken  as 
a  whole  was  irrelevant.  The  more  elaborate  definition  by  avoid 
ing  this  proves  in  the  sequel  more  convenient.  If  the  condition 
x/h^h^x/h  alone  is  satisfied,  we  may  say  that  the  evidence  7^ 
is  '  irrelevant  as  a  whole.'  3 

It  will  be  convenient  to  define  also  two  other  phrases.  7^ 
and  ho  are  independent  and  complementary  parts  of  the  evidence, 
if  between  them  they  make  up  h  and  neither  can  be  inferred  from 
the  other.  If  x  is  the  conclusion,  and  7^  and  7>2  are  independent 
and  complementary  parts  of  the  evidence,  then  7^  is  relevant  if 
the  addition  of  it  to  Ji.2  affects  the  probability  of  x* 

Some  propositions  regarding  irrelevance  will  be  proved  in 
Tart  IT.  If  £,  is  the  contradictory  of  h{  and  x/hih=x/h,  then 
x/JiJi=x/h.  Thus  the  contradictory  of  irrelevant  evidence  is 
also  irrelevant.  Also,  if  x/yh  =  x/h,  it  follows  that  y/xh--i/lh. 
Hence  if,  on  initial  evidence  h,  y  is  irrelevant  to  x,  then,  on  the 
same  initial  evidence,  x  is  irrelevant  to  y,  i.e.  if  in  a  given  state 
of  knowledge  one  occurrence  has  no  bearing  on  another,  then 
equally  the  second  has  no  bearing  on  the  first. 

15.  This  distinction  enables  us  to  formulate  the  Principle  of 
Indifference  at  any  rate  more  precisely.  There  must  be  no 
relevant  evidence  relating  to  one  alternative,  unless  there  is 
corresponding  evidence  relating  to  the  other ;  our  relevant 

1  That  is  to  say,  hl  is  irrelevant  to  xjh  if  x/h1h=x/h. 

2  That  is  to  .say,  At  is  irrelevant  to  x/h,  if  there  is  no  proposition  h\  such  that 
A',/*!*-1-  h\ Ih 4=1,  and  x/h^h  +  x/h. 

3  Where  no  misunderstanding  can  arise,  the  qualification  '  us  a  whole  '  will 
ho  sometimes  omitted. 

4  /...•  (in  symbolism)  A,  and  hz  are  independent  and  complementary  parts  of 
hit  h^^h,  h1/h2^l,&n<lh.tjhl^\.     Also  A!  is  relevant  if  x/h±xjht. 
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evidence,  that  is  to  say,  must  be  symmetrical  with  regard  to  the 
alternatives,  and  must  be  applicable  to  each  in  the  same  manner. 
This  is  the  rule  at  which  the  Principle  of  Indifference  somewhat 
obscurely  aims.  We  must  first  determine  what  parts  of  our 
evidence  are  relevant  on  the  whole  by  a  series  of  judgments  of  rele- 
vance;  not  easily  reduced  to  rule,  of  the  type  described  above. 
If  this  relevant  evidence  is  of  the  same  form  for  both  alternatives, 
then  the  Principle  authorises  a  judgment  of  indifference. 

16.  This  rule  can  be  expressed  more  precisely  in  symbolic 
language.     Let  us  assume,  to  begin  with,  that  the  alternative 
conclusions  are  expressible  in  the  forms  </>(«)  and  </>(&),  where 
(f>(x)  is  a  prepositional  function.1     The  difference  between  them, 
that  is  to  say,  can  be  represented  in  terms  of  a  single  variable. 

The  Principle  of  Indifference  is  applicable  to  the  alternatives 
cj)(a)  and  (/>(&),  when  the  evidence  h  is  so  constituted  that,  if  }(a) 
is  an  independent  part  of  h  (see  §  14)  which  is  relevant  to  (/>(a), 
and  does  not  contain  any  independent  parts  which  are  irrelevant 
to  (f)(a),  then  h  includes  f(b)  also. 

The  rule  can  be  extended  by  successive  steps  to  cases  in 
which  we  have  more  than  one  variable.  We  can,  if  the  necessary 
conditions  are  fulfilled,  successively  compare  the  probabilities 
of  ^(fl^aj  and  ^>(61a2),  and  of  ^(^a.,)  and  ^>(6162),  and  establish 
equality  between  $(0,^)  and  0(6  &.,). 

This  elucidation  is  suited  to  most  of  the  cases  to  which  the 
Principle  of  Indifference  is  ordinarily  applied.  Thus  in  the 
favourite  examples  in  which  balls  are  drawn  from  urns,  we  can 
infer  from  our  evidence  no  relevant  proposition  about  white  balls, 
such  that  we  cannot  infer  a  corresponding  proposition  about 
black  balls.  Most  of  the  examples,  to  which  the  mathematical 
theory  of  chances  has  been  applied,  and  which  depend  upon  the 
Principle  of  Indifference,  can  be  arranged,  I  think,  in  the  forms 
which  the  rule  requires  as  formulated  above. 

17.  We  can  now  clear  up  the  difficulties  which  arose  over  the 
group  of  cases  dealt  with  in  §  9,  the  typical  example  of  which  was 
the  problem  of  the  urn  containing  black  and  white  balls  in  an 
unknown  proportion.     This    more   precise   enunciation  of   the 
Principle  enables  us  to  show  that  of  the  two  solutions  the  equi- 
probability  of  each  '  constitution  '  is  alone  legitimate,  and  the 

1  If  0(a),  0(6),  etc.,  are  propositions,  and  £  is  a  variable,  capable  of  taking 
the  values  a,  6,  etc.,  then  0(z)  is  a  prepositional  function. 
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equiprobability  of  each  numerical  ratio  erroneous.  Let  us  write 
the  alternative  'The  proportion  of  black  balls  is  x'=(f>(x),  and 
the  datum  '  There  are  n  balls  in  the  bag,  with  regard  to  none 
of  which  it  is  known  whether  they  are  black  or  white  '=/i. 
On  the  '  ratio  '  hypothesis  it  is  argued  that  the  Principle  of 
Indifference  justifies  the  judgment  of  indifference,  $(x)/h  = 
<f)(y)/h.  In  order  that  this  may  be  valid,  it  must  be  possible  to 
state  the  relevant  evidence  in  the  form  f(x)  f(y}.  But  this  is 
not  the  case.  If  x  =  l  and  y  =  J,  we  have  relevant  knowledge 
about  the  way  in  which  a  proportion  of  black  balls  of  one  half 
can  arise,  which  is  not  identical  with  our  knowledge  of  the  way 
in  which  a  proportion  of  one  quarter  can  arise.  If  there  are  four 
balls,  A,  B,  C,  D,  one  half  are  black,  if  A,  B  or  A,  C  or  A,  D  or 
B,  C  or  B,  D  or  C,  D  are  black  ;  and  one  quarter  are  black, 
if  A  or  B  or  C  or  D  are  black.  These  propositions  are  not  identical 
in  form,  and  only  by  a  false  judgment  of  irrelevance  can  we 
ignore  them.  On  the  '  constitution '  hypothesis,  however, 
where  A,  B  black  and  A,  C  black  are  treated  as  distinct  alter 
natives,  this  want  of  symmetry  in  our  relevant  evidence  cannot 
arise. 

18.  We  can  also  deal  with  the  point  which  was  illustrated  by 
the  diiliculty  raised  in  §  4.  We  considered  there  the  probabilities 
of  a  and  its  contradictory  a  when  there  is  no  external  evidence 
relevant  to  either.  What  exactly  do  we  mean  by  saying  that 
there  is  no  relevant  evidence  ?  Is  the  addition  of  the  word 
external  significant  ?  If  a  represents  a  particular  proposition, 
we  must  know  something  about  it,  namely,  its  meaning.  May 
not  the  apprehension  of  its  meaning  afford  us  some  relevant 
evidence  ?  If  so,  such  evidence  must  not  be  excluded.  If,  then, 
we  say  that  the™  is  no  relevant  evidence,  we  must  mean  no 
evidence  beyond  what  arises  from  the  mere  apprehension  of  the 
meaning  of  the  symbol  a.  If  we  attach  no  meaning  to  the 
symbol,  it  is  useless  to  discuss  the  value  of  the  probability  ;  for 
the  probability,  which  belongs  to  a  proposition  as  an  object  of 
knowledge,  not  as  a  form  of  words,  cannot  in  such  a  case  exist. 

What  exactly  does  the  symbol  a  stand  for  in  the  above  ? 
Does  it  stand  for  any  proposition  of  which  we  know  no  more 
than  that  it  is  a  proposition  ?  Or  does  it  stand  for  a  particular 
proposition  which  we  understand  but  of  which  we  know  no  more 
than  is  involved  in  understanding  it  ?  In  the  former  case  we 
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cannot  extend  our  result  to  a  proposition  of  which  we  know  even 
the  meaning  ;  for  we  should  then  know  more  than  that  it  is  a 
proposition  ;  and  in  the  latter  case  we  cannot  say  what  the 
probability  of  a  is  as  compared  with  that  of  its  contradictory, 
until  we  know  what  particular  proposition  it  stands  for  ;  for,  as 
we  have  seen,  the  proposition  itself  may  supply  relevant  evidence. 
This  suggests  that  a  source  of  much  confusion  may  lie  in  the 
use  of  symbols  and  the  notion  of  variables  in  probability.  In 
the  logic  of  implication,  which  deals  not  with  probability  but 
with  truth,  what  is  true  of  a  variable  must  be  equally  true  of  all 
instances  of  the  variable.  In  Probability,  on  the  other  hand, 
we  must  be  on  our  guard  wherever  a  variable  occurs.  In  Im 
plication  we  may  conclude  that  ^  is  true  of  anything  of  which 
(/>  is  true.  In  Probability  we  may  conclude  no  more  than  that 
^  is  probable  of  anything  of  which  we  only  know  that  <£  is  true  of 
it.  If  x  stands  for  anything  of  which  $(x)  is  true,  as  soon  as 
we  substitute  in  probability  any  particular  value,  whose  meaning 
we  know,  for  x,  the  value  of  the  probability  may  be  affected  ; 
for  knowledge,  which  was  irrelevant  before,  may  now  become 
relevant.  Take  the  following  example  :  Does  (f)(a)/^(a)  = 
</>(6)/<v/r(6)  ?  That  is  to  say,  is  the  probability  of  </>'s  being  true 
of  a,  given  only  that  -v/r  is  true  of  a,  equal  to  the  probability  of 
</>'s  being  true  of  6,  given  only  that  \jr  is  true  of  b  ?  If  this  simply 
means  that  the  probability  of  an  object's  satisfying  c/>  about 
which  nothing  is  known  except  that  it  satisfies  -v/r  is  equal  to 
ditto  ditto,  the  equation  is  an  identity.  For  in  this  case  (f)(a}/^jr(a) 
means  the  same  as  ^>(6)/^(6),  i.e.  we  know  nothing  about  x  and  y 
except  that  they  satisfy  -v/r,  and  there  is  nothing  whatever  by 
which  we  can  distinguish  a  from  b.  But  if  a  and  b  represent 
specific  entities,  which  we  can  distinguish,  then  the  equality 
does  not  necessarily  hold.  If,  for  instance,  (f>(x)  stands  for  '  x  is 
Socrates,'  then  it  is  plainly  false  that  <f>(a)/^(a)  =  <f)(b)/^(b),  where 
a  stands  for  Socrates  and  b  does  not. 

19.  Bearing  this  danger  in  mind,  we  can  now  give  further 
precision  to  the  enunciation  of  the  Principle  of  Indifference  given 
in  §  16.  Our  knowledge  of  the  meaning  of  a  must  be  taken 
account  of  so  far  as  it  is  relevant ;  and  the  Principle  is  only  satis 
fied  if  we  have  corresponding  knowledge  about  the  meaning  of  b. 
Thus  $(a)/h  =  <f>(b)/h  may  be  true  for  one  pair  of  values  a,  b,  and 
not  true  for  another  pair  of  values  a',  b'. 
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This  makes  it  possible  to  explain  in  part  the  contradiction 
discussed  in  §  4.  Even  if  it  were  true  that  the  probability  of  a  is 
|,  when  we  know  nothing  except  that  a  is  a  proposition,  it  does 
not  follow  that  the  probability  of  '  This  book  is  red  '  is  J,  when 
we  know  the  meanings  of  '  book  '  and  '  red,'  even  if  we  know  no 
more  than  this.  Knowledge  arising  directly  out  of  acquaintance 
with  the  meaning  of  '  red  '  may  be  sufficient  to  enable  us  to  infer 
that  '  red  '  and  '  not-red  '  are  not  satisfactory  alternatives  to 
which  to  apply  the  Principle  of  Indifference.  How  this  may 
come  about  will  be  discussed  in  §§  20,  21. 

But  t^ie  contradictions  are  not  yet  really  solved  ;  for  some 
of  the  difficulties  discussed  in  §  4  can  arise  even  when  we  know 
no  more  of  'a  and  b  than  that  they  are  different  propositions.  In 
fact,  although  we  have  now  stated  more  clearly  than  before  how 
the  Principle  should  be  enunciated,  it  is  not  yet  possible  to  explain 
or  to  avoid  all  the  contradictions  to  which  it  led  us  in  §§  4  to  7. 
For  this  purpose  we  must  proceed  to  a  further  qualification. 

20.  The  examples,  in  which  the  Principle  of  Indifference 
broke  down,  had  a  great  deal  in  common.  We  broke  up  the 
field  of  possibility,  as  we  may  term  it,  into  a  number  of  areas 
by  a  series  of  disjunctive  judgments.  But  the  alternative  areas 
were  not  ultimate.  They  were  capable  of  further  subdivision 
into  other  areas  simil/ir  in  kind  to  the  former.  The  paradoxes 
and  contradictions  arose,  in  each  case,  when  the  alternatives, 
which  the  Principle  of  Indifference  treated  as  equivalent,  actually 
contained  or  might  contain  a  different  or  an  indefinite  number  of 
more  elementary  units. 

In  the  type  of  cases  in  which  the  Principle  of  Indifference 
seemed  to  permit  the  assertion  that,  in  the  absence  of  relevant 
evidence,  a  proposition  is  as  likely  as  its  contradictory,  its  con 
tradictory  is  not  an  ultimate  and  indivisible  alternative  (in  the 
sense  to  be  explained  in  §  21  below),  even  if  the  proposition  itself 
satisfies  this  condition.  For  its  contradictory  can  be  disjunct 
ively  resolved  into  an  indefinite  number  of  sets  of  contraries  to 
the  proposition.  It  was  out  of  this  that  our  difficulties  first  arose. 
'  This  book  is  not  red  '  includes  amongst  others  the  alternatives 
'  This  book  is  black  '  and  '  This  book  is  blue.'  It  is  not,  there 
fore,  an  ultimate  alternative. 

In  the  same  way  the  contradiction  of  §  5  arose  out  of  the  possi 
bility  of  splitting  the  alternatives  '  lie  inhabits  the  British 
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Isles '  into  the  sub-alternatives  '  He  inhabits  Ireland  or  he 
inhabits  Great  Britain.'  And  in  the  third  type  of  case,  to 
which  the  example  of  specific  volume  and  density  belongs,  the 
alternative  '  v  lies  in  the  interval  1  to  2  '  can  be  broken  up  into 
the  sub-alternatives  '  v  lies  in  the  interval  1  to  1J  or  1J  to  2.' 

21.  This,  then,  seems  to  point  the  way  to  the  qualification  of 
which  we  are  in  search.  We  must  enunciate  some  formal  rule 
which  will  exclude  those  cases,  in  which  one  of  the  alternatives 
involved  is  itself  a  disjunction  of  sub-alternatives  of  the  same 
form.  For  this  purpose  the  following  condition  is  proposed. 

Let  the  alternatives,  the  equiprobability  of  which  we  seek  to 
establish  by  means  of  the  Principle  of  Indifference,  be  (/>(%), 
</>(&.,)  .  .  .  (^(ctj,),1  and  let  the  evidence  be  h.  Then  it  is  a  neces 
sary  condition  for  the  application  of  the  principle,  that  these 
should  be,  relatively  to  the  evidence,  indivisible  alternatives  of 
the  form  (f)(x).  We  may  define  a  divisible  alternative  in  the 
following  manner  : 

An  alternative  $(ar)  is  divisible  if 

(i.)   [4>(ar^^(a^  +  <j>(a^)]/h  =  lf 
(ii.)  </>(ar<)  .   <t>(ar,,)/k  =  o, 
(iii.)   (p(ar')/h  4=  o  and  <p(aY>^/h  +  o 

The  condition  that  the  sub-alternatives  must  be  of  the  same 
form  as  the  original  alternatives,  i.e.  expressible  by  means  of  the 
same  prepositional  function  $(x),  deserves  attention.  It  might 
be  the  case  that  the  original  alternatives  had  nothing  substantial 
in  common  ;  i.e.  $(x)  =  x  is  the  only  propositional  function 
common  to  all  of  them,  the  alternatives  being  al5  a2,  .  .  .,  ar.  In 
these  circumstances  the  condition  in  question  cannot  be  satisfied. 
For  the  proposition  ar  can  always  be  resolved  into  the  disjunction 
arb  +  afi,  where  b  is  any  proposition  and  5  its  contradictory.  If, 
on  the  other  hand,  the  alternatives  which  we  are  comparing  can 
be  expressed  in  the  forms  (^(aj  and  (/>(«2),  where  the  function 
(f)(x)  is  distinct  from  x,  it  is  not  necessarily  the  case  that  either 
of  these  can  be  resolved  into  a  disjunctive  combination  of  terms 
which  can  be  expressed  in  their  turn  in  the  same  form. 

Dispensing  with  symbolism,  we  can  express  these  conditions 
as  follows  :  Our  knowledge  must  not  enable  us  to  split  up  the 

1  The  more  complicated  cases  in  which  the  propositional  function,  of  which 
the  alternatives  are  instances,  involves  more  than  one  variable  (see  §  16),  can  be 
dealt  with  in  a  similar  manner  mutatis  mutandis. 
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alternative  </>(flr)  into  a  disjunction  of  two  sub-alternatives,  (i.) 
which  are  themselves  expressible  in  the  same  form  (£,  (ii.)  which 
are  mutually  exclusive,  and  (iii.)  which,  on  the  evidence,  are 
possible. 

In  short,  the  Principle  of  Indifference  is  not  applicable  to  a 
pair  of  alternatives,  if  we  know  that  either  of  them  is  capable  of 
being  further  split  up  into  a  pair  of  possible  but  incompatible 
alternatives  of  the  same  form  as  the  original  pair. 

22.  This    rule    commends    itself    to  common  sense.      If   we 
know  that  the  two  alternatives  are  compounded  of  a  different 
number  or  of  an  indefinite  number  of  sub-alternatives  which  are 
in  other  respects  similar,  so  far  as  our  evidence  goes  to  the 
original  alternatives,  then  this  is  a  relevant  fact  of  which  we 
must  take  account.     And  as  it  affects  the  two  alternatives  in 
differing  and  unsymmetrical  ways,  it  breaks  down  the  funda 
mental  condition  for  the  valid  application  of  the  Principle  of 
Indifference. 

Neither  this  consideration  nor  that  discussed  in  §§  18  and  19 
substantially  modify  the  Principle  of  Indifference  as  enunciated 
in  §  16.  They  have  only  served  to  make  explicit  what  was 
always  implicit  in  the  Principle,  by  explaining  the  manner  in 
which  our  knowledge  of  the  form  and  meaning  of  the  alternatives 
may  be  a  relevant  part  of  the  evidence.  The  apparent  con 
tradictions  arose  from  paying  attention  to  what  we  may  term 
the  extraneous  evidence  only,  to  the  neglect  of  such  part  of  the 
evidence  as  bore  upon  the  form  and  meaning  of  the  alternatives. 

23.  The  application  of  this  result  to  the  examples  cited  in  §  18 
is  not  difficult.     It  excludes  the  class  of  cases  in  which  a  pro 
position  and  its  contradictory  constitute  the  alternatives.     For 
if  b  is  the  proposition  and  5  its  contradictory,  we  cannot  find 
a  prepositional  function  <f>(x)  wliich  will  satisfy  the  necessary 
conditions.     It  deals  also  with  the  type  of  contradiction  which 
arose  in  considering  the  probability  that  an  individual  taken  at 
random  \vas  an  inhabitant  of  a  given  region.     If,  on  the  other 
hand,  the  term  '  country  '  is  so  defined  that  one  country  cannot 
include  two  countries,  then  an  individual  is,  relatively  to  suitable 
hypotheses,  as  likely  to  be  an  inhabitant  of  one  as  of  another. 
For  the  function  </>(./•),  where  <$>(x)  =  '  the  individual  is  an  in 
habitant  of  country  x,'  satisfies  the  conditions.     And  it  deals 
with  the  example  of  ranges  of  specific  volume  and  specific  density, 
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because  there  is  no  range  which  does  not  contain  within  itself  two 
similar  ranges.  As  there  are  in  this  case  no  definite  units  by 
which  we  can  define  equal  ranges,  the  device,  which  will  be  referred 
to  in  §  25  for  dealing  with  geometrical  probabilities,  is  not  avail 
able. 

24.  It  is  worth  while  to  add  that  the  qualification  of  §  21  is 
fatal  to  the  practical  utility  of  the  Principle  of  Indifference  in 
those  cases  only  in  which  it  is  possible  to  find  no  ultimate  alter 
natives  which  satisfy  the  conditions.     For  if  the  original  alterna 
tives  each  comprise  a  definite  number  of  indivisible  and  indifferent 
sub-alternatives,  we  can  compute  their  probabilities.     It  is  often 
the  case,  however,  that  we  cannot  by  any  process  of  finite  sub 
division  arrive  at  indivisible  sub-alternatives,  or  that,  if  we  can, 
they  are  not  on  the  evidence  indifferent.     In  the  examples  given 
above,  for  instance,  where  $(x)=x,  or  where  x  is  a  part  of  un 
specified  magnitude  in  a  continuum,   there  are  no  indivisible 
sub-alternatives.     The  first  type  comprises  all  cases,  amongst 
others,  in  which  we  weigh  the  probabilities  of  a  proposition  and 
its  contradictory  ;    and  the  second  includes  a  great  number  of 
cases  in  which  physical  or  geometrical  quantities  are  involved. 

25.  We  can  now  return  to  the  numerous  paradoxes  which 
arise  in  the  study  of  geometrical  probability  (see  §§  7,  8).     The 
qualification  of  §  21  enables  us,  I  think,  to  discover  the  source 
of  the  confusion.     Our  alternatives  in  these  problems  relate  to 
certain  areas  or  segments  or  arcs,  and  however  small  the  elements 
are  which  we  adopt  as  our  alternatives,  they  are  made  up  of  yet 
smaller  elements  which  would  also  serve  as  alternatives.     Our 
rule,  therefore,  is  not  satisfied,  and,  as  long  as  we  enunciate  them 
in  this  shape,  we  cannot  employ  the  Principle  of  Indifference. 
But  it  is  easy  in  most  cases  to  discover  another  set  of  alternatives 
which  do  satisfy  the  condition,  and  which  will  often  serve  our 
purpose  equally  well.     Suppose,  for  instance,  that  a  point  lies 
on  a  line  of  length  m.L,  we  may  write  the  alternative  '  the  interval 
of  length  Z.  on  which  the  point  lies  is  the  xih  interval  of  that 
length  as  we  move  along  the  line  from  left  to  right '  =</>(-£)  ;   and 
the  Principle  of  Indifference  can  then  be  applied  safely  to  the  m 
alternatives  <£(!),  <£(2)  .  .  .  0(w),  the  number  m  increasing  as  the 
length  I  of  the  intervals  is  diminished.     There  is  no  reason  why 
I  should  not  be  of  any  definite  length  however  small. 

If  we  deal  with  the  problems  of  geometrical  probability  in 
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this  way,  we  shall  avoid  the  contradictory  conclusions,  which 
arise  from  confusing  together  distinct  elementary  areas.  In  the 
problem,  for  instance,  of  the  chord  drawn  at  random  in  a  circle, 
which  is  discussed  in  §  7,  the  chord  is  regarded,  not  as  a  one- 
dimensional  line,  but  as  the  limit  of  an  area,  the  shape  of  which 
is  different  in  each  of  the  variant  solutions.  In  the  first  solution 
it  is  the  limit  of  a  triangle,  the  length  of  the  base  of  which  tends 
to  zero  ;  in  the  second  solution  it  is  the  limit  of  a  quadrilateral, 
two  of  the  sides  of  which  are  parallel  and  at  a  distance  apart 
which  tends  to  zero  ;  and  in  the  third  solution  the  area  is  defined 
by  the  limiting  position  of  a  central  section  of  undefined  shape. 
These  distinct  hypotheses  lead  inevitably  to  different  results.  If 
we  were  dealing  with  a  strictly  linear  chord,  the  Principle  of 
Indifference  would  yield  us  no  result,  as  we  could  not  enunciate 
the  alternatives  in  the  required  form  ;  and  if  the  chord  is  an 
elementary  area,  we  must  know  the  shape  of  the  area  of  which 
it  is  the  limit.  So  long  as  we  are  careful  to  enunciate  the  alter 
natives  in  a  form  to  which  the  Principle  of  Indifference  can  be 
applied  unambiguously,  we  shall  be  prevented  from  confusing 
together  distinct  problems,  and  shall  be  able  to  reach  conclusions 
in  geometrical  probability  which  are  unambiguously  valid. 

The  substance  of  this  explanation  can  be  put  in  a  slightly 
different  way  by  saying  that  it  is  not  a  matter  of  indifference  in 
these  cases  in  what  manner  we  proceed  to  the  limit.  We  must 
assign  the  probabilities  before  proceeding  to  the  limit,  which 
we  can  do  unambiguously.  But  if  the  problem  in  hand  does 
not  stop  at  small  finite  lengths,  areas,  or  volumes,  and  we 
have  to  proceed  to  the  limit,  then  the  final  result  depends  upon 
the  shape  in  which  the  body  approaches  the  limit.  Mathemati 
cians  will  recognise  an  analogy  between  this  case  and  the  deter 
mination  of  potential  at  points  within  a  conductor.  Its  value 
depends  upon  the  shape  of  the  area  which  in  the  limit  represents 
the  point. 

26.  The  positive  contributions  of  this  chapter  to  the  deter 
mination  of  valid  judgments  of  equiprobability  are  two.  In  the 
first  place  we  have  stated  the  Principle  of  Indifference  in  a  more 
accurate  form,  by  displaying  its  necessary  dependence  upon 
judgments  of  relevance  and  so  bringing  out  the  hidden  element 
of  direct  judgment  or  intuition,  which  it  has  always  involved. 
It  has  been  shown  that  the  Principle  lays  down  a  rule  by  which 
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direct  judgments  of  relevance  and  irrelevance  can  lead  on  to 
judgments  of  preference  and  indifference.  In  the  second  place, 
some  types  of  consideration,  which  are  in  fact  relevant,  but  which 
are  in  danger  of  being  overlooked,  have  been  brought  into  promi 
nence.  By  this  means  it  has  been  possible  to  avoid  the  various 
types  of  doubtful  and  contradictory  conclusions  to  which  the 
Principle  seemed  to  lead,  so  long  as  we  applied  it  without  due 
qualification. 


CHAPTER  V 

OTHER    METHODS    OF   DETERMINING    PROBABILITIES 

1.  THE  recognition  of  the  fact,  that  not  all  probabilities  are 
numerical,  limits  the  scope  of  the  Principle  of  Indifference.  It 
lias  always  boon  agreed  that  a  numerical  measure  can  actually 
be  obtained  in  those  cases  only  in  which  a  reduction  to  a  set  of 
exclusive  and  exhaustive  equiprobable  alternatives  is  practicable. 
Our  previous  conclusion  that  numerical  measurement  is  often 
impossible  agrees  very  well,  therefore,  with  the  argument  of  the 
preceding  chapter  that  the  rules,  in  virtue  of  which  we  can  assert 
equiprobability,  are  somewhat  limited  in  their  field  of  application. 

But  the  recognition  of  this  same  fact  makes  it  more  necessary 
to  discuss  the  principles  which  will  justify  comparisons  of  more 
and  less  between  probabilities,  where  numerical  measurement  is 
theoretically,  as  well  as  practically,  impossible.  We  must,  for 
the  reasons  given  in  the  preceding  chapter,  rely  in  the  last  resort 
on  direct  judgment.  The  object  of  the  following  rules  and 
principles  is  to  reduce  the  judgments  of  preference  and  relevance, 
which  we  are  compelled  to  make,  to  a  few  relatively  simple  types.1 

2.  We  will  enquire  first  in  what  circumstances  we  can  expect 
a  comparison  of  more  and  less  to  be  theoretically  possible.  1 
am  inclined  to  think  that  this  is  a  matter  about  which,  rather 
unexpectedly  perhaps,  we  are  able  to  lay  down  definite  rules. 
We  are  able,  I  think,  always  to  compare  a  pair  of  probabilities 
which  are 

(i.)  of  the  type,  ab/h  and  a/lt, 
or  (ii.)  of  the  type  ajhh^  and  a/ht 

provided  the  additional  evidence  hl  contains  only  one  inde 
pendent  piece  of  relevant  information. 

1  Parts  of  Chap.  XV.  are  closely  connected  with  iho  topics  of  the  follow 
ing  paragraphs,  and  the  discussion  which  is  commenced  here  is  concluded  there. 
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(i.)  The  propositions  of  Part  II.  will  enable  us  to  prove  that 
ab/h  <  a/h  unless  b/ah  =  1  ; 

that  is  to  say,  the  probability  of  our  conclusion  is  diminished  by 
the  addition  to  it  of  something,  which  on  the  hypothesis  of  our 
argument  cannot  be  inferred  from  it.  This  proposition  will  be 
self-evident  to  the  reader.  The  rule,  that  the  probability  of  two 
propositions  jointly  is,  in  general,  less  than  that  of  either  of  them 
separately,  includes  the  rule  that  the  attribution  of  a  more 
specialised  concept  is  less  probable  than  the  attribution  of  a  less 
specialised  concept. 

(ii.)  This  condition  requires  a  little  more  explanation.  It 
states  that  the  probability  ajTih^  is  always  greater  than,  equal  to, 
or  less  than  the  probability  a/h,  if  Jil  contains  no  pair  of  comple 
mentary  and  independent  parts  I  both  relevant  to  a/h.  If  h± 
is  favourable,  a/Mj  >  a/h.  Similarly,  if  h2  is  favourable  to  a/M1? 
a/Mj/jg  >  a/hJi^.  The  reverse  holds  if  li±  and  h2  are  unfavourable. 
Thus  we  can  compare  a/hh'  and  a/h,  in  every  case  in  which  the 
relevant  independent  parts  of  the  additional  evidence  hr  are 
either  all  favourable,  or  all  unfavourable.  In  cases  in  which  our 
additional  evidence  is  equivocal,  part  taken  by  itself  being  favour 
able  and  part  unfavourable,  comparison  is  not  necessarily  possible. 
In  ordinary  language  we  may  assert  that,  according  to  our  rule, 
the  addition  to  our  evidence  of  a  single  fact  always  has  a  definite 
bearing  on  our  conclusion.  It  either  leaves  its  probability  un 
affected  and  is  irrelevant,  or  it  has  a  definitely  favourable  or 
unfavourable  bearing,  being  favourably  or  unfavourably  relevant. 
It  cannot  affect  the  conclusion  in  an  indefinite  way,  which  allows 
no  comparison  between  the  two  probabilities.  But  if  the  addition 
of  one  fact  is  favourable,  and  the  addition  of  a  second  is  unfavour 
able,  it  is  not  necessarily  possible  to  compare  the  probability  of 
our  original  argument  with  its  probability  when  it  has  been 
modified  by  the  addition  of  both  the  new  facts. 

Other  comparisons  are  possible  by  a  combination  of  these 
two  principles  with  the  Principle  of  Indifference.  We  may 
find,  for  instance,  that  a/hh^a/h,  that  a/h  =  b/h,  that  b/h>b/hh2, 
and  that,  therefore,  a/hh^b/hh^  We  have  thus  obtained  a 
comparison  between  a  pair  of  probabilities,  which  are  not 
of  the  types  discussed  above,  but  without  the  introduction 

1  See  Chap.  IV.  §  14  for  the  meaning  of  these  terms. 
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of  any  fresh  principle.      We   may  denote  comparisons  of  this 
type  by  (iii.). 

3.  Whether  any  comparisons  are  possible  which  do  not  fall 
within  any  of  the  categories  (i.),  (ii.)..  or  (iii.),  I  do  not  feel  certain. 
We  undoubtedly  make  a  number  of  direct  comparisons  which 
do  not  seem  to  be  covered  by  them.  We  judge  it  more  probable, 
for  instance,  that  Caesar  invaded  Britain  than  that  Romulus 
founded  Rome.  But  even  in  such  cases  as  this,  where  a  reduction 
into  the  regular  form  is  not  obvious,  it  might  prove  possible  if 
we  could  clearly  analyse  the  real  grounds  of  our  judgment.  We 
might  argue  in  this  instance  that,  whereas  Romulus's  founding  of 
Rome  rests  solely  on  tradition,  we  have  in  addition  evidence  of 
another  kind  for  Caesar's  invasion  of  Britain,  and  that,  in  so 
far  as  our  belief  in  Caesar's  invasion  rests  on  tradition,  we  have 
reasons  of  a  precisely  similar  kind  as  for  our  belief  in  Romulus 
ivithout  the  additional  doubt  involved  in  the  maintenance  of  a 
tradition  between  the  times  of  Romulus  and  Caesar.  By  some 
such  analysis  as  this  our  judgment  of  comparison  might  be 
brought  within  the  above  categories. 

The  process  of  reaching  a  judgment  of  comparison  in  this  way 
may  be  called  '  schematisation.' l  We  take  initially  an  ideal 
scheme  which  falls  within  the  categories  of  comparison.  Let 
us  represent  '  the  historical  tradition  x  has  been  handed  down 
from  a  date  many  years  previous  to  the  time  of  Caesar  '  by 
-^(z);  '  the  historical  tradition  x  has  been  handed  down  from 
the  time  of  Caesar'  by  ^2(x)  ;  '  the  historical  tradition  x  has 
extra-traditional  support '  by  ijr3(ur)  ;  and  the  two  traditions, 
the  Romulus  tradition  and  the  Caesar  tradition  respectively, 
by  a  and  b.  Then  if  our  relevant  evidence  h  were  of  the  form 
^iM^M'-OW^)'  it  is  easily  seen  that  the  comparison  a/h<b/h 
could  be  justified  on  the  lines  laid  down  above.2  A  further  judg 
ment,  that  our  actual  evidence  presented  no  relevant  divergence 
from  this  schematic  form,  would  then  establish  the  practical 
conclusion.  As  I  am  not  aware  of  any  plausible  judgment  of 
comparison  which  we  make  in  common  practice,  but  which  is 
clearly  incapable  of  reduction  to  some  schematic  form,  and  as 
I  Bee  no  logical  basis  for  such  a  comparison,  I  feel  justified  in 

1  Tliis  phrase  ia  used  by  Von  Krie.s,  op.  rit.  p.  179,  in  a  somewhat  similar 
connection. 

1  For  a  ,>,(a),  A ';,(/,};  «/^(«).  a/vM«)  ;  */  vM&)  "  &/^W.(/>)  : 
a/Yi(a)  ah;  and  6/^2(6)^,(6)  -  b/h. 
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doubting  the  possibility  of  comparing  the  probabilities  of  argu 
ments  dissimilar  in  form  and  incapable  of  schematic  reduction. 
But  the  point  must  remain  very  doubtful  until  this  part  of  the 
subject  has  received  a  more  prolonged  consideration. 

4.  Category  (ii.)  is  very  wide,  and  evidently  covers  a  great 
variety  of  cases.  If  we  are  to  establish  general  principles  of  argu 
ment  and  so  avoid  excessive  dependence  on  direct  individual 
judgments  of  relevance,  we  must  discover  some  new  and  more 
particular  principles  included  within  it.  Two  of  these — those 
of  Analogy  and  of  Induction — are  excessively  important,  and 
will  be  the  subject  of  Part  III.  of  this  book.  In  addition  to  these 
a  few  criteria  will  be  examined  and  established  in  Chapter  XIV., 
§§  4  and  8  (49.1).  We  must  be  content  here  (pending  the 
symbolic  developments  of  Part  II.)  with  the  two  observations 
following : 

(1)  The  addition  of  new  *  evidence  7^  to  a  doubtful 2  argument 
a/h  is  favourably  relevant,  if  either  of  the  following  conditions 
is  fulfilled  :— (a)  if  a/hJi^Q  ;   (b)  if  a/hh^l.     Divested  of  sym 
bolism,  this  merely  amounts  to  a  statement  that  a  piece  of 
evidence   is   favourable   if,  in   conjunction   with   the   previous 
evidence,  it  is  either  a  necessary  or  a  sufficient  condition  for  the 
truth  of  our  conclusion. 

(2)  It  might  plausibly  be  supposed  that  evidence  would  be 
favourable  to  our  conclusion  which  is  favourable  to  favourable 
evidence — i.e.  that,  if  h±  is  favourable  to  x/h  and  x  is  favourable  to 
a/h,  h:  is  favourable  to  a/h.     Whilst,  however,  this  argument 
is   frequently  employed  under   conditions,   which,   if  explicitly 
stated,  would  justify  it,  there  are  also  conditions  in  which  this  is 
not  so,  so  that  it  is  not  necessarily  valid.     For  the  very  deceptive 
fallacy  involved    in    the    above    supposition,   Mr.   Johnson  has 
suggested  to  me  the  name  of  the  Fallacy  of  the  Middle  Term.     The 
general  question — If  h^  is  favourable  to  x/h  and  x  is  favourable  to 
a/h,  in  what  conditions  is  ht  favourable  to  a/h  ?— will  be  examined 
in  Chapter  XIV.  §§  4  and  8  (49.1).     In  the  meantime,  the  intui 
tion  of  the  reader  towards  the  fallacy  may  be  assisted  by  the 
following  observations,  which  are  due  to  Mr.  Johnson  : 

Let  x.  x',  x"  ...  be  exclusive  and  exhaustive  alternatives 
under  datum  h.     Let  /^  and  a  be  concordant  in  regard  to  each  of 

1  A,  is  -new  evidence  so  long  as  hjli  •  1. 
-  The  argument  is  doubtful  so  long  as  a/h  is  neither  certain  nor  impossible. 
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these  alternatives  :  i.e.  any  hypothesis  which  is  strengthened  by 
hi  will  strengthen  a,  and  any  hypothesis  which  is  weakened  by 
h^  will  weaken  a.  it  is  obvious  that,  if  /^  strengthens  some  of 
the  hypotheses  x,  x  ,  x"  .  .  .,  it  will  weaken  others.  This  fact 
helps  us  to  see  why  we  cannot  consider  the  concordance  of  /^ 
and  a  in  regard  to  one  single  alternative,  but  must  be  able  to 
assert  their  concordance  with  regard  to  every  one  of  the  exclusive 
and  exhaustive  alternatives,  including  the  particular  one  taken. 
But  a  further  condition  is  needed,  which  (as  we  shall  show)  is 
obviously  satisfied  in  two  typical  problems  at  least.  This  further 
condition  is  that,  for  each  hypothesis  x,  x' ',  x"  .  .  .,  it  shall  hold 
that,  were  this  hypothesis  known  to  be  true,  the  knowledge  of 
hi  would  not  weaken  the  probability  of  a. 

These  two  conditions  are  sufficient  to  ensure  that  /^  shall 
strengthen  a  (independently  of  knowledge  of  x,  x' ,  x"  .  .  .)  ; 
and,  in  a  sense,  they  appear  to  be  necessary  ;  for,  unless  they  are 
satisfied,  the  dependence  of  h±  upon  a  would  be  (so  to  speak) 
accidental  as  regards  the  '  middle  terms,'  (x,  x  ,x"  .  .  .). 

The  necessity  for  reference  to  all  the  alternatives  x,  x' ,  x"  .  .  . 
is  analogous  to  the  requirement  of  distribution  of  the  middle 
term  in  ordinary  syllogism.  Thus,  from  premises  %i  All  P  is  x, 
all  S  is  £,"  the  conclusion  that  "  S's  are  P  "  does  not  formally 
follow  ;  but  given  "  all  P  is  x  and  all  S  is  x'  "  it  does  follow  that 
"  no  S  are  P  ",  where  x'  is  any  contrary  to  x.  The  two  conditions 
taken  together  would  be  analogous  to  the  argument :  all  x  S  is 
P  ;  all  x'  S  is  P  ;  all  x"  S  is  P  ;  ...  therefore  all  S  is  P. 

First  Typical  Problem.-  An  urn  contains  an  unknown  pro 
portion  of  differently  coloured  balls.  A  ball  is  drawn  and  replaced. 
Then  x,  x' ,  x"  .  .  .  stand  for  the  various  possible  proportions. 
Let  h-i  mean  "  a  white  ball  has  been  drawn  "  ;  and  let  a  mean 
'k  a  white  ball  will  be  again  drawn."  Then  any  hypothesis  which 
is  strengthened  by  h^  will  strengthen  a  ;  and  any  hypothesis 
which  is  weakened  by  h±  will  weaken  a.  Moreover,  were  any 
one  of  these  hypotheses  known  to  be  true,  the  knowledge  of  ht 
would  not  weaken  the  probability  of  a.  JLence,  in  the  absence 
of  detinite  knowledge  as  regards  x,  x  ,  x"  .  .  .,  the  knowledge 
of  hi  would  strengthen  the  probability  of  a. 

Second  Typical  Problem. — Let  a  certain  event  have  taken 
place  ;  which  may  have  been  x,  x' ,  x"  or  .  .  .  Let  h±  mean  that 
A  reports  so  and  so  ;  and  let  a  mean  that  li  reports  similarly  or 
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identically.  The  phrase  similarly  merely  indicates  that  any 
hypothesis  as  to  the  actual  fact,  which  would  be  strengthened  by 
A's  report,  would  be  strengthened  by  B's  report.  Of  course, 
even  if  the  reports  were  verbally  identical,  A's  evidence  would  not 
necessarily  strengthen  the  hypothesis  in  an  equal  degree  with 
B's  ;  because  A  and  B  may  be  unequally  expert  or  intelligent. 
Now,  in  such  cases,  we  may  further  affirm  (in  general),  that,  were 
the  actual  nature  of  the  event  known,  the  knowledge  of  A's  report 
on  it  would  not  weaken  (though  it  also  need  not  strengthen)  the 
probability  that  B  would  give  a  similar  report.  Hence,  in  the 
absence  of  such  knowledge,  the  knowledge  of  /^  would  strengthen 
the  probability  of  a. 

5.  Before  leaving  this  part  of  the  argument  we  must  emphasise 
the  part  played  by  direct  judgment  in  the  theory  here  presented. 
The  rules  for  the  determination  of  equality  and  inequality  between 
probabilities  all  depend  upon  it  at  some  point.  This  seems  to 
me  quite  unavoidable.  But  I  do  not  feel  that  we  should  regard 
it  as  a  weakness.  For  we  have  seen  that  most,  and  perhaps  all, 
cases  can  be  determined  by  the  application  of  general  principles 
to  one  simple  type  of  direct  judgment.  No  more  is  asked  of  the 
intuitive  power  applied  to  particular  cases  than  to  determine 
whether  a  new  piece  of  evidence  tells,  on  the  whole,  for  or  against 
a  given  conclusion.  The  application  of  the  rules  involves  no 
wider  assumptions  than  those  of  other  branches  of  logic. 

While  it  is  important,  in  establishing  a  control  of  direct 
judgment  by  general  principles,  not  to  conceal  its  presence,  yet 
the  fact  that  we  ultimately  depend  upon  an  intuition  need  not 
lead  us  to  suppose  that  our  conclusions  have,  therefore,  no  basis 
in  reason,  or  that  they  are  as  subjective  in  validity  as  they  are 
in  origin.  It  is  reasonable  to  maintain  with  the  logicians  of  the 
Port  Royal  that  we  may  draw  a  conclusion  which  is  truly  probable 
by  paying  attention  to  all  the  circumstances  which  accompany 
the  case,  and  we  must  admit  with  as  little  concern  as  possible 
Hume's  taunt  that  "  when  we  give  the  preference  to  one  set  of 
arguments  above  another,  we  do  nothing  but  decide  from  our 
feeling  concerning  the  superiority  of  their  influence." 


CHAPTER  VI 

THE   WEIGHT   OF   ARGUMENTS 

1.  THE  question  to  be  raised  in  this  chapter  is  somewhat  novel ; 
after  much  consideration  I  remain  uncertain  as  to  how  much 
importance  to  attach  to  it.  The  magnitude  of  the  probability 
of  an  argument,  in  the  sense  discussed  in  Chapter  III.,  depends 
upon  a  balance  between  what  may  be  termed  the  favourable  and 
the  unfavourable  evidence  :  a  new  piece  of  evidence  which  leaves 
this  balance  unchanged,  also  leaves  the  probability  of  the  argu 
ment  unchanged.  But  it  seems  that  there  may  be  another 
respect  in  which  some  kind  of  quantitative  comparison  between 
arguments  is  possible.  This  comparison  turns  upon  a  balance, 
not  between  the  favourable  and  the  unfavourable  evidence,  but 
between  the  absolute  amounts  of  relevant  knowledge  and  of 
relevant  ignorance  respectively. 

As  the  relevant  evidence  at  our  disposal  increases,  the  magni 
tude  of  the  probability  of  the  argument  may  either  decrease  or 
increase,  according  as  the  new  knowledge  strengthens  the  un 
favourable  or  the  favourable  evidence  ;  but  soitwthiny  seems  to 
have  increased  in  cither  case,— we  have  a  more  substantial  basis 
upon  which  to  rest  our  conclusion.  I  express  this  by  saying  that 
an  accession  of  new  evidence  increases  the  weight  of  an  argu 
ment.  New  evidence  will  sometimes  decrease  tin-  probability  <>f 
an  argument,  but  it  will  always  increase  its  '  weight.' 

2.  Tin;  measurement  of  evidential  weight  presents  similar 
difficulties  to  those  with  which  we  met  in  the  measurement  of 
probability.  Only  in  a  restricted  class  of  cases  can  we  compare 
the  weights  of  two  arguments  in  respect  of  more  and  less.  But 
this  must  always  be  possible  where  the  conclusion  of  the  two 
arguments  is  the  same,  and  the  relevant  evidence  in  the  OIK;  in 
cludes  and  exceed*  the  evidence  in  the  other.  1  f  the  new  evidence 
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is  '  irrelevant,'  in  the  more  precise  of  the  two  senses  defined  in  §  14 
of  Chapter  IV.,  the  weight  is  loft  unchanged.  If  any  part  of  the 
new  evidence  is  relevant,  then  the  value  is  increased. 

The  reason  for  our  stricter  definition  of  '  relevance  '  is  now 
apparent.  If  we  are  to  be  able  to  treat '  weight '  and  '  relevance ' 
as  correlative  terms,  we  must  regard  evidence  as  relevant,  part 
of  which  is  favourable  and  part  unfavourable,  even  if,  taken  as 
a  whole,  it  leaves  the  probability  unchanged.  With  this  defini 
tion,  to  say  that  a  new  piece  of  evidence  is  '  relevant '  is  the  same 
thing  as  to  say  that  it  increases  the  '  weight '  of  the  argument. 

A  proposition  cannot  be  the  subject  of  an  argument,  unless 
we  at  least  attach  some  meaning  to  it,  and  this  meaning,  even  if 
it  only  relates  to  the  form  of  the  proposition,  may  be  relevant 
in  some  arguments  relating  to  it.  But  there  may  be  no  other 
relevant  evidence  ;  and  it  is  sometimes  convenient  to  term  the 
probability  of  such  an  argument  an  d  priori  probability.  In 
this  case  the  weight  of  the  argument  is  at  its  lowest.  Start 
ing,  therefore,  with  minimum  weight,  corresponding  to  d  priori 
probability,  the  evidential  weight  of  an  argument  rises,  though 
its  probability  may  either  rise  or  fall,  with  every  accession  of 
relevant  evidence. 

3.  Where  the  conclusions  of  two  arguments  are  different,  or 
where  the  evidence  for  the  one  does  not  overlap  the  evidence 
for  the  other,  it  will  often  be  impossible  to  compare  their  weights, 
just  as  it  may  be  impossible  to  compare  their  probabilities.  Some 
rules  of  comparison,  however,  exist,  and  there  seems  to  be  a  close, 
though  not  a  complete,  correspondence  between  the  conditions 
under  which  pairs  of  arguments  are  comparable  in  respect  of 
probability  and  of  weight  respectively.  We  found  that  there  were 
three  principal  types  in  which  comparison  of  probability  was 
possible,  other  comparisons  being  based  on  a  combination  of 
these  : — 

(i.)  Those  based  on  the  Principle  of  Indifference,  subject 
to  certain  conditions,  and  of  the  form  (fxi/^a.Ji^^fo)/^.//.,, 
where  7^  and  h.2  are  irrelevant  to  the  arguments. 

(ii.)  a/hh^a/h,  where  h]  is  a  single  unit  of  information, 
containing  no  independent  parts  which  are  relevant. 

(iii.)  ab//i £(t//i. 

Let  us  represent  the  evidential  weight  of  the  argument, 
whose  probability  is  a/h,  by  V(a/h).  Then,  corresponding  to 
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the  above,  \ve  find  that  the  following  comparisons  of  weight  are 
possible  :— 

(i.)  V(</)'//-v/r^.^1)  =V((pb/\lrl). //.,),  where  hl  and  ht  are  irrelevant 
in  the  strict  sense.  Arguments,  that  is  to  say,  to  which  the 
Principle  of  Indifference  is  applicable,  have  equal  evidential 
weights. 

(ii.)  V(a/kk1)>V(a/k),  unless  h}  is  irrelevant,  in  which  case 
\(a//i/(1)  =  V(a//i).  The  restriction  on  the  composition  of  /?p 
which  is  necessary  in  the  case  of  comparisons  of  magnitude,  is 
not  necessary  in  the  case  of  weight. 

There  is,  however,  no  rule  for  comparisons  of  weight  corre 
sponding  to  (iii.)  above.  It  might  be  thought  that  V(ab/h)  <V(a/h), 
on  the  ground  that  the  more  complicated  an  argument  is,  relative 
to  given  premisses,  the  less  is  its  evidential  weight.  But  this 
is  invalid.  The  argument  ab/h  is  further  off  proof  than  was  the 
argument  a/h ;  but  it  is  nearer  disproof.  For  example,  if  abfU  =  0 
and  a/h>0,  then  V(ab/h)>\(a/h).  In  fact  it  would  seem  to 
be  the  case  that  the  weight  of  the  argument  a/h  is  always 
equal  to  that  of  d/h,  where  a  is  the  contradictory  of  a  ;  i.e., 
V(a/h)=*V(d/h).  For  an  argument  is  always  as  near  proving  or 
disproving  a  proposition,  as  it  is  to  disproving  or  proving  its 
contradictory. 

4.  It  may  be  pointed  out  that  if  a/h  =  b/h,  it  does  not  neces 
sarily  follow  that  \(a/h)=--V(b/h).  It  has  been  asserted  already 
that  if  the  first  equality  follows  directly  from  a  single  application  of 
the  Principle  of  Indifference,  the  second  equality  also  holds.  But 
the  first  equality  can  exist  in  other  cases  also.  If,  for  instance, 
a  and  b  are  members  respectively  of  different  sets  of  three  equally 
probable  exclusive  and  exhaustive  alternatives,  then  a/h  =  bjh ;  but 
these  arguments  may  have  very  different  weights.  If,  however, 
a  and  6  can  each,  relatively  to  h,  be  inferred  from  the  other,  i.e.  if 
a/bk  =  1  and  b/ah  =  1 ,  then  V(a/h)  =  V(b/h).  For  in  proving  or  dis 
proving  one,  we  are  necessarily  proving  or  disproving  the  other. 

Further  principles  could,  no  doubt,  be  arrived  at.  The  above 
can  be  combined  to  reach  results  in  cases  upon  which  unaided 
common-sense  might  feel  itself  unable  to  pronounce,  with  con 
fidence.  Suppose,  for  instance,  that  we  have  three  exclusive 
and  exhaustive  alternatives,  a,  b,  and  c,  and  that  a/h^b/h 
in  virtue  of  the  Principle  of  Indifference,  then  we  have 
V(o/A)  =  V(6//0  and  V(«/h)  =  V(o//0,  so  that  V (&//*)  =  V(o//<).  It  is 
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also  true,   since  d/(b  +  c)li  =  1    and    (6  I  c)/db     1 ,   that  V(d/h)  = 
V((6  +  c)/h).     Hence  V(b/h)  =  V((b  f  c)/A). 

5.  The  preceding  paragraphs  will  have  made  it  clear  that  the 
weighing  of  the  amount  of  evidence  is  quite  a  separate  process 
from  the  balancing  of  the  evidence  for  and  against.     In  so  far, 
however,  as  the  question  of  weight  has  been  discussed  at  all, 
attempts  have  been  made,  as  a  rule,  to  explain  the  former  in 
terms  of  the  latter.     If  x/h-Ji^^'j  and  x/Jil=%,  it  has  sometimes 
been  supposed  that  it  is  more  probable  that  x/hji^  really  is  -§  than 
that  jp/Aj  really  is  £-.     According  to  this  view,  an  increase  in  the 
amount  of  evidence  strengthens  the  probability  of  the  proba 
bility,   or,   as  De  Morgan   would  say,  the  presumption   of  the 
probability.     A  little  reflection  will  show  that  such  a  theory  is 
untenable.     For  the  probability  of  x  on  hypothesis  Jij  is  inde 
pendent  of  whether  as  a  matter  of  fact  x  is  or  is  not  true,  and  if 
we  find  out  subsequently  that  x  is  true,  this  does  not  make  it 
false  to  say  that  on  hypothesis  h^  the  probability  of  x  is  J.     Simi 
larly  the  fact  that  x/h-J^  is  f  does  not  impugn  the  conclusion  that 
x/hl  is  f ,  and  unless  we  have  made  a  mistake  in  our  judgment  or 
our  calculation  on  the  evidence,  the  two  probabilities  art  H  and  f 
respectively. 

6.  A  second  method,  by  which  it  might  be  thought,  perhaps, 
that  the  question  of  weight  has  been  treated,  is  the  method  of 
probable  error.     But  while  probable  error  is  sometimes  connected 
with  weight,  it  is  primarily  concerned  with  quite  a  different  ques 
tion.     '  Probable   error,'   it   should  be   explained,   is  the  name 
given,  rather  inconveniently  perhaps,  to   an  expression  which 
arises  when  we  consider  the  probability  that  a  given  quantity  is 
measured  by  one  of 'a  number  of  different  magnitudes.     Our 
data  may  tell  us  that  one  of  these  magnitudes  is  the  most  probable 
measure  of  the  quantity  ;    but  in  some  cases  it  will  also  tell 
us  how  probable  each  of  the  other  possible  magnitudes  of  the 
quantity  is.     In  such  cases  we  can  determine  the  probability 
that  the  quantity  will  have  a  magnitude  which  does  not  differ 
from  the  most  probable  by  more  than  a  specified  amount.     The 
amount,  which  the  difference  between  the  actual  value  of  the 
quantity  and  its  most  probable  value  is  as  likely  as  not  to  exceed, 
is  the  '  probable  error.'     In  many  practical  questions  the  exist 
ence  of  a  small  probable  error  is  of  the  greatest  importance, 
if  our  conclusions  are  to  prove  valuable.     The  probability  that 
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the  quantity  lias  any  particular  magnitude  may  be  very  small ; 
but  this  may  matter  very  little,  if  then1,  is  a  high  probability 
that  it  lies  within  a  certain  range. 

Now  it  is  obvious  that  the  determination  of  probable  error 
is  intrinsically  a  different  problem  from  the  determination  of 
weight.  The  method  of  probable  error  is  simply  a  summation  of 
a  number  of  alternative  and  exclusive  probabilities.  If  we  say 
that  the  most  probable  magnitude  is  x  and  the  probable  error  //, 
tins  is  a  way,  convenient  for  many  purposes,  of  summing  up  a 
number  of  probable  conclusions  regarding  a  variety  of  magni 
tudes  other  than  ./•  which,  on  the  evidence,  the  quantity  may 
possess.  The  connection  between  probable  error  and  weight,  such 
as  it  is,  is  due  to  the  fact  that  in  scientific  problems  a  large 
probable  error  is  not  uncommonly  due  to  a  great  lack  of  evidence, 
and  that  as  the  available  evidence  increases  there  is  a  tendency 
for  the  probable  error  to  diminish.  In  these  cases  the  probable 
error  may  conceivably  be  a  good  practical  measure  of  the  weight. 

It  is  necessary,  however,  in  a  theoretical  discussion,  to  point 
out  that  the  connection  is  casual,  and  only  exists  in  a  limited 
class  of  cases.  This  is  easily  shown  by  an  example.  We  may 
have  data  on  which  the  probability  of  .£  =  5  is  J,  of  ar-=G  is  ], 
of  x  =  l  is  i,  of  x=-  8  is  ,1,  and  of  x--S  is  .jY>.  Additional  evidence 
might  show  that  x  must  either  be  5  or  «S  or  9,  the  probabilities  of 
each  of  these  conclusions  being  j7,.,,  ,6,. ,  ,4,; .  The  evidential  weight 
of  the  latter  argument  is  greater  than  that  of  the  former,  but  the 
probable  error,  so  far  from  being  diminished,  has  been  increased. 
There  is,  in  fact,  no  reason  whatever  for  supposing  that  the 
probable  error  must  necessarily  diminish,  as  the  weight  of  the 
argument  is  increased. 

The  typical  case,  in  which  there  may  be  a  practical  connection 
between  weight  and  probable  error,  may  be  illustrated  by  the 
two  cases  following  of  balls  drawn  from  an  urn.  In  each  case  we 
require  the  probability  of  drawing  a  white  ball  ;  in  the  first  case 
we  know  that  the  urn  contains  black  and  white  in  equal  propor 
tions;  in  the  second  case  the  proportion  of  each  colour  is  unknown, 
and  each  ball  is  as  likely  to  be  black  as  white.  It  is  evident  that 
in  either  case  the  probability  of  drawing  a  white  ball  is  .1,  but 
that  the  weight  of  the  argument  in  favour  of  this  conclusion  is 
greater  in  the  first  case.  When  we  consider  the  most  probable 
proportion  in  which  balls  will  be  drawn  in  the  long  run,  if  after 
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each  withdrawal  they  are  replaced,  the  question  of  probable 
error  enters  in,  and  we  find  that  the  greater  evidential  weight  of 
the  argument  on  the  first  hypothesis  is  accompanied  by  the 
smaller  probable  error. 

This  conventionalised  example  is  typical  of  many  scientific 
problems.  The  more  we  know  about  any  phenomenon,  the  less 
likely,  as  a  rule,  is  our  opinion  to  be  modified  by  each  additional 
item  of  experience.  In  such  problems,  therefore,  an  argument 
of  high  weight  concerning  some  phenomenon  is  likely  to  be  accom 
panied  by  a  low  probable  error,  when  the  character  of  a  series 
of  similar  phenomena  is  under  consideration. 

7.  Weight  cannot,  then,  be  explained  in  terms  of  probability. 
An  argument  of  high  weight  is  not  '  more  likely  to  be  right '  than 
one  of  low  weight ;  for  the  probabilities  of  these  arguments  only 
state  relations  between  premiss  and  conclusion,  and  these  re 
lations  are  stated  with  equal  accuracy  in  either  case.  Nor  is  an 
argument  of  high  weight  one  in.  which  the  probable  error  is  small ; 
for  a  small  probable  error  only  means  that  magnitudes  in  the 
neighbourhood  of  the  most  probable  magnitude  have  a  relatively 
high  probability;  and  an  increase  of  evidence  does  not  necessarily 
involve  an  increase  in  these  probabilities. 

The  conclusion,  that  the  '  weight '  and  the  '  probability  '  of  an 
argument  are  independent  properties,  may  possibly  introduce  a 
difficulty  into  the  discussion  of  the  application  of  probability 
to  practice.1  For  in  deciding  on  a  course  of  action,  it  seems 
plausible  to  suppose  that  we  ought  to  take  account  of  the  weight 
as  well  as  the  probability  of  different  expectations.  But  it  is 
difficult  to  think  of  any  clear  example  of  this,  and  I  do  not 
feel  sure  that  the  theory  of  '  evidential  weight '  has  much 
practical  significance. 

Bernoulli's  second  maxim,  that  we  must  take  into  account  all 
the  information  we  have,  amounts  to  an  injunction  that  we  should 
be  guided  by  the  probability  of  that  argument,  amongst  those  of 
which  we  know  the  premisses,  of  which  the  evidential  weight  is 
the  greatest.  But  should  not  this  be  re-enforced  by  a  further 
maxim,  that  we  ought  to  make  the  weight  of  our  arguments  as 
great  as  possible  by  getting  all  the  information  we  can  '£  2  It  is 

1  See  also  Chapter  XXVI.  §  7. 

2  Cf.  Locke,  Essay  concerning  Human  Under  standing. book  ii.  chap.  xxi.  fi  67 : 
"  He  that  judges  without  informing  himself  to  the  utmost  that  he  is  capable, 
cannot  acquit  himself  of  judging  amiss." 
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difficult  to  see,  however,  to  what  point  the  strengthening  of  an 
argument's  weight  by  increasing  the  evidence  ought  to  be  pushed. 
We  may  argue  that,  when  our  knowledge  is  slight  but  capable  of 
increase,  the  course  of  action,  which  will,  relative  to  such  know 
ledge,  probably  produce  the  greatest  amount  of  good,  will  often 
consist  in  the  acquisition  of  more  knowledge.  But  there  clearly 
comes  a  point  when  it  is  no  longer  worth  while  to  spend  trouble, 
before  acting,  in  the  acquisition  of  further  information,  and  then; 
is  no  evident  principle  by  which  to  determine  how  far  we  ought, 
to  carry  our  maxim  of  strengthening  the  weight  of  our  argument. 
A  little  reflection  will  probably  convince  the  reader  that  this  is 
a  very  confusing  problem. 

8.  The  fundamental  distinction  of  this  chapter  may  be  briefly 
repeated.     One  argument  has  more  weight  than  another  if  it  is 
based  upon  a  greater  amount  of  relevant  evidence  ;   but  it  is  not 
always,  or  even  generally,  possible  to  say  of  two  sets  of  proposi 
tions  that  one  set  embodier  more  evidence  than  the  other.     It  has 
a  greater  probability  than  another  if  the  balance  in  its  favour, 
of  what  evidence  there  is,  is  greater  than  the  balance  in  favour 
of  the  argument  with  which  we  compare  it :   but  it  is  not  always, 
or  even  generally,  possible  to  say  that  the  balance  in  the  one  case 
is  greater  than  the  balance  in  the  other.     The  weight,  to  speak 
metaphorically,  measures  the  sum  of  the  favourable  and  unfavour 
able  evidence,  the  probability  measures  the  difference. 

9.  The  phenomenon  of  '  weight '  can  be  described  from  the 
point  of  view  of  other  theories  of  probability  than  that  which  is 
adopted  here.     If  we  follow  eerlain  Herman  logicians  in  regarding 
probability  as  being  based  on  the  disjunctive  judgment,  we  may 
say  that  the  weight  is  increased  when  the  number  of  alternatives 
is  reduced,  although,  the  ratio  of  the  number  of  favourable  to 
the  number  of  unfavourable  alternatives   may   not  have   been 
disturbed;     or,   to  adopt  the   phraseology   of  another  German 
school,  we  may  say  that  t  he  weight  of  the  probability  is  increased, 
as  the  field  of  possibility  is  contracted. 

The  same  distinction  may  be  explained  in  the  language  of  the 
frequency  theory.1  \V<»  should  then  say  that  the  weight  is  in 
creased  if  we  are  able  to  employ  as  the  class  of  rcf'Tence  a  class 
which  is  contained  in  the  original  class  of  reference. 

10.  The  subject  of  this  chapter  has  not  usually  been  discussed 

1   S.-r  Chap.   VIII. 
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by  \vriters  on  probability,  and  I  know  of  only  two  by  whom  the 
question  has  been  explicitly  raised  :  1  Meinong,  who  threw  out  a 
suggestion  at  the  conclusion  of  his  review  of  Yon  Kries'  "  Princi- 
pien,"  published  in  the  Gottingische  gelehrte  Anzeigen  for  1890 
(see  especially  pp.  70-74),  and  A.  Nitsche,  who  took  up  Meinong's 
suggestion  in  an  article  in  the  Vierteljahrsschrift  fur  wissenschaft- 
liche  Philosophic,  1892,  vol.  xvi.  pp.  20-35,  entitled  "Die  Dimen- 
sionen  der  Wahrscheinlichkeit  und  die  Evidenz  der  Ungewissheit." 
Meinong,  who  does  not  develop  the  point  in  any  detail,  dis 
tinguishes  probability  and  weight  as  '  Intensitat '  and  '  Qualitat,' 
and  is  inclined  to  regard  them  as  two  independent  dimensions  in 
which  the  judgment  is  free  to  move — they  are  the  two  dimensions 
of  the  *  Urteils-Continuum.'  Nitsche  regards  the  weight  as  being 
the  measure  of  the  reliability  (Sicherheit)  of  the  probability,  and 
holds  that  the  probability  continually  approximates  to  its  true 
magnitude  (reale  Geltung)  as  the  weight  increases.  His  treatment 
is  too  brief  for  it  to  be  possible  to  understand  very  clearly  what 
he  means,  but  his  view  seems  to  resemble  the  theory  already 
discussed  that  an  argument  of  high  weight  is  '  more  likely  to  be 
right '  than  one  of  low  weight. 

1  There  are  also  some  remarks  by  Czuber  (Wahrscheinlichkeitsrechnung, 
vol.  i.  p.  202)  on  the  Erkennlnisswert  of  probabilities  obtained  by  different 
methods,  which  may  have  been  intended  to  have  some  bearing  on  it. 


CHAPTER  VII 

HISTORICAL    RETROSPECT 

1.  THE  characteristic  features  of  our  Philosophy  of  Probability 
must  be  determined  by  the  solutions  which  we  offer  to  the 
problems  attacked  in  Chapters  III.  and  IV.  Whilst  a  great  part 
of  the  logical  calculus,  which  will  be  developed  in  Part  II.,  would 
be  applicable  with  slight  modification  to  several  distinct  theories 
of  the  subject,  the  ultimate  problems  of  establishing  the  premisses 
of  the  calculus  bring  into  the  light  every  fundamental  difference 
of  opinion. 

These  problems  are  often,  for  this  reason  perhaps,  left  on  one 
side  by  writers  whose  interest  chiefly  lies  in  the  more  formal  parts 
of  the  subject.  But  Probability  is  not  yet  on  so  sound  a  basis 
that  the  formal  or  mathematical  side  of  it  can  be  safely  developed 
in  isolation,  and  some  attempts  have  naturally  been  made  to 
solve  the  problem  which  Bishop  Butler  sets  to  the  logician  in  the 
concluding  words  of  the  brief  discussion  on  probability  with 
which  he  prefaces  the  Analogy} 

In  this  chapter,  therefore,  we  will  review  in  their  historical 
order  the  answers  of  Philosophy  to  the  questions,  how  we  know 
relations  of  probability,  what  ground  we  have  for  our  judgments, 
and  by  what  method  we  can  advance  our  knowledge. 

2.  The  natural  man  is  disposed  to  the  opinion  that  probability 
is  essentially  connected  with  the  inductions  of  experience  and, 
if  he  is  a  little  more  sophisticated,  with  the  Laws  of  Causation 

1  "  It  is  not  my  design  to  inquire  further  into  the  nature,  tho  foundation  and 
measure  of  probability  ;  or  whence  it  proceeds  that  likenexs  should  beget  that 
presumption,  opinion  and  full  conviction,  which  tho  human  mind  is  formed 
to  receive  from  it,  and  which  it  docs  necessarily  produce  in  everv  one  ;  or  to 
guard  against  the  errors  to  which  reasoning  from  analogy  is  liable.  This 
belongs  to  the  subject  of  logic,  and  is  a  part  of  that  subject  which  has  not  yet 
been  thoroughly  considered." 

71) 
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and  of  the  Uniformity  of  Nature.  As  Aristotle  says,  "the 
probable  is  that  which  usually  happens."  Events  do  not  always 
occur  in  accordance  with  the  expectations  of  experience  ;  but 
the  laws  of  experience  afford  us  a  good  ground  for  supposing 
that  they  usually  will.  The  occasional  disappointment  of  these 
expectations  prevents  our  predictions  from  being  more  than 
probable  ;  but  the  ground  of  their  probability  must  be  sought  in 
this  experience,  and  in  this  experience  only. 

This  is,  in  substance,  the  argument  of  the  authors  of  the  Port 
Royal  Logic  (1662),  who  were  the  first  to  deal  with  the  logic 
of  probability  in  the  modern  manner  :  "In  order  for  me  to 
judge  of  the  truth  of  an  event,  and  to  be  determined  to  believe 
it  or  not  believe  it,  it  is  not  necessary  to  consider  it  abstractly, 
and  in  itself,  as  we  should  consider  a  proposition  in  geometry  ; 
but  it  is  necessary  to  pay  attention  to  all  the  circumstances 
which  accompany  it,  internal  as  well  as  external.  I  call  internal 
circumstances  those  which  belong  to  the  fact  itself,  and  external 
those  which  belong  to  the  persons  by  whose  testimony  we  are  led 
to  believe  it.  This  being  done,  if  all  the  circumstances  are 
such  that  it  never  or  rarely  happens  that  the  like  circumstances 
are  the  concomitants  of  falsehood,  our  mind  is  led,  naturally, 
to  believe  that  it  is  true."1  Locke  follows  the  Port  Royal 
Logicians  very  closely  :  "  Probability  is  likeliness  to  be  true.  .  .  . 
The  grounds  of  it  are,  in  short,  these  two  following.  First,  the 
conformity  of  anything  with  our  own  knowledge,  observation, 
and  experience.  " Secondly,  the  testimony  of  others,  vouching 
their  observation  and  experience  "  ;  2  and  essentially  the  same 
opinion  is  maintained  by  Bishop  Butler  :  "  When  we  determine 
a  thing  to  be  probably  true,  suppose  that  an  event  has  or  will 
come  to  pass,  it  is  from  the  mind's  remarking  in  it  a  likeness  to 
some  other  event,  which  we  have  observed  has  come  to  pass. 
And  this  observation  forms,  in  numberless  instances,  a  pre 
sumption,  opinion,  or  full  conviction  that  such  event  has  or  will 
come  to  pass."  3 

Against  this  view  of  the  subject  the  criticisms  of  Hume  were 
directed  :  "  The  idea  of  cause  and  effect  is  derived  from  experi 
ence,  which  informs  us,  that  such  particular  objects,  in  all  past 

1  Eng.  Trans.,  p.  353. 

2  An  Essay  concerning  Human  Understanding,  book  iv.   "  Of  Knowledge  and 

Opinion." 

8  Introduction  to  the  Analogy. 
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instances,  have  been  constantly  conjoined  with  each  other.  .  .  . 
According  to  this  account  of  things  .  .  .  probability  is  founded 
on  the  presumption  of  a  resemblance  betwixt  those  objects,  of 
which  we  have  had  experience,  and  those,  of  which  we  have  had 
none  ;  and  therefore  'tis  impossible  this  presumption  can  arise 
from  probability." l  "  When  we  are  accustomed  to  see  two  impres 
sions  conjoined  together,  the  appearance  or  idea  of  the  one  im 
mediately  carries  us  to  the  idea  of  the  other.  .  .  .  Thus  all  prob 
able  reasoning  is  nothing  but  a  species  of  sensation.  'Tis  not 
solely  in  poetry  and  music,  we  must  follow  our  taste  and  senti 
ment,  but  likewise  in  philosophy.  When  I  am  convinced  of  any 
principle,  'tis  only  an  idea,  which  strikes  more  strongly  upon  me. 
When  I  give  the  preference  to  one  set  of  arguments  above  another, 
I  do  nothing  but  decide  from  my  feeling  concerning  the  superi 
ority  of  their  influence."  2  Hume,  in  fact,  points  out  that,  while 
it  is  true  that  past  experience  gives  rise  to  a  psychological  anticipa 
tion  of  some  events  rather  than  of  others,  no  ground  has  been 
given  for  the  validity  of  this  superior  anticipation. 

3.  But  in  the  meantime  the  subject  had  fallen  into  the  hands 
of  the  mathematicians,  and  an  entirely  new  method  of  approach 
was  in  course  of  development.  It  had  become  obvious  that 
many  of  the  judgments  of  probability  which  we  in  fact  make 
do  not  depend  upon  past  experience  in  a  way  which  satisfies  the 
canons  laid  down  by  the  Port  Royal  Logicians  or  by  Locke.  In 
particular,  alternatives  an;  judged  equally  probable,  without 
there  being  necessarily  any  actual  experience  of  their  approxi 
mately  equal  frequency  of  occurrence  in  the  past.  And,  apart 
from  this,  it  is  evident  that  judgments  based  on  a  somewhat 
indefinite  experience  of  the  past  do  not  easily  lend  them 
selves  to  precise  numerical  appraisement.  Accordingly  James 
Bernoulli,3  the  real  founder  of  the  classical  school  of  mathematical 
probability,  while,  not  repudiating  the  old  test  of  experience,  had 
based  many  of  his  conclusions  on  a  quite  different  criterion-  the 
rule  which  1  have  named  the  Principle  of  IndiiTerence.  The. 
traditional  method  of  the  mathematical  school  essentially 
depends  upon  reducing  all  the  possible  conclusions  to  a  number 
of  '  equi-probable  cases.'  And,  according  to  the  Principle  of 

1  Treatise  of  Hunmn  Xature,  p.  IJ'Jl  (Green's  edition). 
1  Op.  cit.  p.  403. 

3  See  especially  A r/t  Conjecta ndi,  p.  224.  Cf.  L;ipl:iro,  Theorie  analytique, 
p.  178. 
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Indifference,  '  cases  '  are  held  to  be  equi-probable  when  there 
is  no  reason  for  preferring  any  one  to  any  other,  when  there  is 
nothing,  as  with  Buridan's  ass,  to  determine  the  mind  in  any  one 
of  the  several  possible  directions.  To  take  Czuber's  example 
of  dice,1  this  principle  permits  us  to  assume  that  each  face  is 
equally  likely  to  fall,  if  there  is  no  reason  to  suppose  any  particular 
irregularity,  and  it  does  not  require  that  we  should  know  that  the 
construction  is  regular,  or  that  each  face  has,  as  a  matter  of  fact, 
fallen  equally  often  in  the  past. 

On  this  Principle,  extended  by  Bernoulli  beyond  those 
problems  of  gaming  in  which  by  its  tacit  assumption  Pascal 
and  Huyghens  had  worked  out  a  few  simple  exercises,  the  whole 
fabric  of  mathematical  probability  was  soon  allowed  to  rest. 
The  older  criterion  of  experience,  never  repudiated,  was  soon 
subsumed  under  the  new  doctrine.  First,  in  virtue  of  Bernoulli's 
famous  Law  of  Great  Numbers,  the  fractions  representing  the 
probabilities  of  events  were  thought  to  represent  also  the  actual 
proportion  of  their  occurrences,  so  that  experience,  if  it  were 
considerable,  could  be  translated  into  the  cyphers  of  arithmetic. 
And  next,  by  the  aid  of  the  Principle  of  Indifference,  Laplace 
established  his  Law  of  Succession  by  which  the  influence  of  any 
experience,  however  limited,  could  be  numerically  measured,  and 
which  purported  to  prove  that,  if  B  has  been  seen  to  accompany 
A  twice,  it  is  two  to  one  that  B  will  again  accompany  A  on  A's 
next  appearance.  No  other  formula  in  the  alchemy  of  logic 
has  exerted  more  astonishing  powers.  For  it  has  established 
the  existence  of  God  from  the  premiss  of  total  ignorance  ;  and  it 
has  measured  with  numerical  precision  the  probability  that  the 
sun  will  rise  to-morrow. 

Yet  the  new  principles  did  not  win  acceptance  without 
opposition.  D'Alembert,2  Hume,  and  Ancillon  3  stand  out  as 
the  sceptical  critics  of  probability,  against  the  credulity  of 

1  Wahrscheinlichkeitsrechnung,  p.  9. 

2  D'Alembert's  scepticism  was  directed  towards  the  current  mathematical 
theory  only,  and  was  not,  like  Hume's,  fundamental  and  far-reaching.     His 
opposition  to  the  received  opinions  was,   perhaps,  more  splendid  than  dis 
criminating. 

3  Am-il  Ion's  communication  to  the  Berlin  Academy  in  1794,  entitled  Doutes 
xur  lefi  bases  du  calcul  des  probabilites,  is  not  as  well  known  as  it  deserves  to 
be.      He  writes  as  a  follower  of  Hume,  but  adds  much  that  is  original  and 
interesting.     An  historian,  who  also  wrote  on  a  variety  of  philosophical  subjects, 
Ani'illon  was,  at  one  time,  the  Prussian  Minister  of  Foreign  Affairs. 
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eighteenth-century  philosophers  who  were  ready  to  swallow 
without  too  many  questions  the  conclusions  of  a  science  which 
claimed  and  seemed  to  bring  an  entire  new  field  within  the 
dominion  of  Reason.1 

The  first  effective  criticism  came  from  Hume,  who  was  also 
the  first  to  distinguish  the  method  of  Locke  and  the  philosophers 
from  the  method  of  Bernoulli  and  the  mathematicians.  "  Prob 
ability,"  he  says,  "  or  reasoning  from  conjecture,  may  be  divided 
into  two  kinds,  viz.  that  which  is  founded  on  chance  and  that  which 
arises  from  causes."  2  By  these  two  kinds  he  evidently  means  the 
mathematical  method  of  counting  the  equal  chances  based  on 
Indifference,  and  the  inductive  method  based  on  the  experience 
of  uniformity.  He  argues  that  '  chance '  alone  can  be  the 
foundation  of  nothing,  and  "  that  there  must  always  be  a  mixture 
of  causes  among  the  chances,  in  order  to  be  the  foundation  of 
any  reasoning."  3  His  previous  argument  against  probabilities, 
which  were  based  on  an  assumption  of  cause,  is  thus  extended 
to  the  mathematical  method  also. 

But  the  great  prestige  of  Laplace  and  the  '  verifications  ' 
of  his  principles  which  his  more  famous  results  were  supposed 
to  supply  had,  by  the  beginning  of  the  nineteenth  century, 
established  the  science  on  the  Principle  of  Indifference  in  an 
almost  unquestioned  position.  It  may  be  noted,  however,  that 
De  Morgan,  the  principal  student  of  the  subject  in  England, 
seems  to  have  regarded  the  method  of  actual  experiment  and 
the  method  of  counting  cases,  which  were  equally  probable 
on  grounds  of  Indifference,  as  alternative  methods  of  equal 
validity. 

4.  The  reaction  against  the  traditional  teaching  during  the 
past  hundred  years  has  not  possessed  sufficient  force  to  displace 

1  French  philosophy  of  the  latter  half  of  the  eighteenth  century  was  pro 
foundly  alfected  by  the  supposed  conquests  of  the  Calculus  of  Probability  in 
all  fields  of  thought.  Nothing  seemed  beyond  its  powers  of  prediction,  and 
it  almost  succeeded  in  men's  minds  to  the  place  previously  occupied  by 
Revelation.  It  was  under  these  influences  that  Condorcet  evolved  his  doctrine 
of  the  perfectibility  of  the  human  nice.  The  continuity  and  oneness  of 
modern  European  thought  may  be  illustrated,  if  such  things  amuse  the 
reader,  by  the  reflection  that  Condorcet  derived  from  Bernoulli,  that  Godwin 
was  inspired  by  Condorcct,  that  Malthus  was  stimulated  by  Godwin's  folly 
into  stating  his  famous  doctrine,  and  that  from  the  reading  of  .Malthus 
on  Population  Darwin  received  his  earliest  impulse. 

1  Treatise  of  Human  \ature,  p.  424  (Green's  edition). 

a  Op.  cit.  p.  42f>. 
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the  established  doctrine,  and  the  Principle  of  Indifference  is 
still  very  widely  accepted  in  an  unqualified  form.  Criticism 
has  proceeded  along  two  distinct  lines  ;  the  one,  originated  by 
Leslie  Ellis,  and  developed  by  Dr.  Venn,  Professor  Edgeworth, 
and  Professor  Karl  Pearson,  has  been  almost  entirely  confined 
in  its  influence  to  England  ;  the  other,  of  which  the  beginnings 
are  to  be  seen  in  Boole's  Laws  of  Thought,  has  been  developed 
in  Germany,  where  its  ablest  exponent  has  been  Von  Kries. 
France  has  remained  uninfluenced  by  either,  and  faithful,  on 
the  whole,  to  the  tradition  of  Laplace.  Even  Henri  Poincare, 
who  had  his  doubts,  and  described  the  Principle  of  Indifference 
as  "  very  vague  and  very  elastic,"  regarded  it  as  our  only 
guide  in  the  choice  of  that  convention,  "  which  has  always 
something  arbitrary  about  it,"  but  upon  which  calculation  in 
probability  invariably  rests.1 

5.  Before  following  up  in  detail  these  two  lines  of  develop 
ment,  I  will  summarise  again  the  earlier  doctrine  with  which  the 
leaders  of  the  new  schools  found  themselves  confronted. 

The  earlier  philosophers  had  in  mind  in  dealing  with  prob 
ability  the  application  to  the  future  of  the  inductions  of  experience, 
to  the  almost  complete  exclusion  of  other  problems.  For  the 
data  of  probability,  therefore,  they  looked  only  to  their  own 
experience  and  to  the  recorded  experiences  of  others  ;  their 
principal  refinement  was  to  distinguish  these  two  grounds,  and 
they  did  not  attempt  to  make  a  numerical  estimate  of  the  chances. 
The  mathematicians,  on  the  other  hand,  setting  out  from  the 
simple  problems  presented  by  dice  and  playing  cards,  and 

1  Poincare's  opinions  on  Probability  are  to  be  found  in  his  Calcul  des  Prob- 
abilites  and  in  his  Science  et  Hypothese.  Neither  of  these  books  appears 
to  me  to  be  in  all  respects  a  considered  work,  but  his  view  is  sufficiently  novel 
to  be  worth  a  reference.  Briefly,  he  shows  that  the  current  mathematical 
definition  is  circular,  and  argues  from  this  that  the  choice  of  the  particular 
probabilities,  which  we  are  to  regard  as  initially  equal  before  the  application  of 
our  mathematics,  Lu  entirely  a  matter  of  '  convention.'  Much  epigram  is, 
therefore,  expended  in  pointing  out  that  the  study  of  probability  is  no  more 
than  a  polite  exercise,  and  he  concludes :  "  Le  calcul  des  probabilites  offre  une 
contradiction  dans  les  termes  merries  qui  servent  a  le  designer,  et,  si  je  ne  crai- 
gnais  de  rappeler  ici  un  mot  trop  souvent  repete,  je  dirais  qu'il  nous  enseigne 
surtout  une  chose;  c'est  de  savoir  que  nous  ne  savons  rien."  On  the  other 
hand,  the  greater  part  of  his  book  is  devoted  to  working  out  instances  of  practi 
cal  application,  and  he  speaks  of  '  metaphysics '  legitimising  particular  conven 
tions.  How  this  comes  about  is  not  explained.  He  seems  to  endeavour  to 
save  his  reputation  as  a  philosopher  by  the  surrender  of  probability  as  a  valid 
conception,  without  at  the  same  time  forfeiting  his  claim  as  a  mathematician 
to  work  out  probable  formulae  of  practical  importance. 
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requiring  for  the  application  of  their  methods  a  basis  of  numerical 
measurement,  dwelt  on  the  negative  rather  than  the  positive 
side  of  their  evidence,  and  found  it  easier  to  measure  equal 
degrees  of  ignorance  than  equivalent  quantities  of  experience. 
This  led  to  the  explicit  introduction  of  the  Principle  of  Indifference, 
or,  as  it  was  then  termed,  the  Principle  of  Non-Suflicient  Reason. 
The  great  achievement  of  the  eighteenth  century  was,  in  the  eyes 
of  the  early  nineteenth,  the  reconciliation  of  the  two  points  of 
view  and  the  measurement  of  probabilities,  which  were  grounded 
on  experience,  by  a  method  whose  logical  basis  was  the  Principle 
of  Non-Sufficient  Reason.  This  would  indeed  have  been  a  very 
astonishing  discovery,  and  would,  as  its  authors  declared,  have 
gradually  brought  almost  every  phase,  of  human  activity  within 
the  power  of  the  most  refined  mathematical  analysis. 

But  it  was  not  long  before  more  sceptical  persons  began  to 
suspect  that  this  theory  proved  too  much.     Jts  calculations,  it 
is  true,  were  constructed  from  the  data  of  experience,  but  the 
more  simple  and  the  less  complex  the  experience  the  better  satis 
fied  was  the  theory.     What  was  required  was  not  a  wide  experi 
ence  or  detailed  information,  but  a  completeness  of  symmetry  in 
the  little  information  there  might  be.     It  seemed  to  follow  from 
the  Laplacian  doctrine  that  the  primary  qualification  for  one 
who  would  be  well  informed  was  an  equally  balanced  ignorance. 
6.  The  obvious  reaction  from  a  teaching,  which  seemed  to 
derive  from  abstractions  results  relevant  to  experience,  was  into 
the  arms  of  empiricism  ;    and  in  the  state  of  philosophy  at  that 
time  England  was  the  natural  home  of  this  reaction.     The  first 
protest,  of  which  I  am  aware,  came  from  Leslie  Ellis  in  1842.1 
At  the  conclusion  of  his  Remarks  on  an  alleged  proof  of  the  Method 
of  least  squares,2   "  Mere   ignorance,"   he   says,    "  is   no  ground 
for    any    inference    whatever.     Ex    mhilo    mini.''     In    Venn's 
Logic  of  Chanee  Ellis's  suggestions  are  developed  into  a  complete 
theory  :  3  "  Experience  is  our  sole  guide.     If  we  want  to  discover 
what  is  in  reality  a  series  of  things,  not  a  series  of  our  own  concep 
tions,  we  must  appeal  to  the  things  themselves  to  obtain  it,  for 
we  cannot  find  much  help  elsewhere."     Professor  Edgeworth  4 
was  an  early  disciple  of  the  same  school:    "The  probability,"  he 

1  On  the,  foundation*  of  the  Theory  of  rrohdliiUtieH. 

2  Rcpublished  in  Miscellaneous  H'r»/iwf/.«. 
:J  tragic  of  Chanrt,  [>.  7-1. 

4   Mflrrtikc,  |>.  -I. 
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says,  "  of  head  occurring  n  times  if  the  coin  is  of  the  ordinary 
make  is  approximately  at  least  (J)71.  This  value  is  rigidly  deducible 
from  positive  experience,  the  observations  made  by  gamesters, 
the  experiments  recorded  by  Jevons  and  De  Morgan." 

The  doctrines  of  the  empirical  school  will  be  examined  in 
Chapter  VIII.,  and  I  postpone  my  detailed  criticism  to  that 
chapter.  Venn  rejects  the  applications  of  Bernoulli's  theorem, 
which  he  describes  as  "  one  of  the  last  remaining  relics  of  Realism," 
as  well  as  the  later  Laplacian  Law  of  Succession,  thus  destroying 
the  link  between  the  empirical  and  the  a  priori  methods.  But, 
apart  from  this,  his  view  that  statements  of  probability  are 
simply  a  particular  class  of  statements  about  the  actual  world 
of  phenomena,  would  have  led  him  to  a  closer  dependence  on 
actual  experience.  He  holds  that  the  probability  of  an  event's 
having  a  certain  attribute  is  simply  the  fraction  expressing  the 
proportion  of  cases  in  which,  as  a  matter  of  actual  fact,  this 
attribute  is  present.  Our  knowledge,  however,  of  this  proportion 
is  often  reached  inductively,  and  shares  the  uncertainty  to  which 
all  inductions  are  liable.  And,  besides,  in  referring  an  event  to 
a  series  we  do  not  postulate  that  all  the  members  of  the  series 
should  be  identical,  but  only  that  they  should  not  be  known  to 
differ  in  a  relevant  manner.  Even  on  this  theory,  therefore,  we 
are  not  solely  determined  by  positive  knowledge  and  the  direct 
data  of  experience. 

7.  The  Empirical  School  in  their  reaction  against  the  preten 
tious  results;  which  the  Laplacian  theory  affected  to  develop 
out  of  nothing;  have  gone  too  far  in  the  opposite  direction.  If 
our  experience  and  our  knowledge  were  complete,  we  should 
be  beyond  the  need  of  the  Calculus  of  Probability.  And  where 
our  experience  is  incomplete,  we  cannot  hope  to  derive  from  it 
judgments  of  probability  without  the  aid  either  of  intuition  or  of 
some  further  a  priori  principle.  Experience,  as  opposed  to  in 
tuition,  cannot  possibly  afford  us  a  criterion  by  which  to  judge 
whether  on  given  evidence  the  probabilities  of  two  propositions 
are  or  are  not  equal. 

However  essential  tbe  data  of  experience  may  be,  they  cannot 

by  themselves,  it  seems,  supply  us  with  what  we  want.     Czuber,1 

who  prefers  what  he  calls  the  Principle  of  Compelling  Reason 

(das  Prinzip  des  zwingenden  Grundes),  and  holds  that  Probability 

1    Wahrscheinlichkeitsrechnung,  p.  11. 
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has  an  objective  and  not  merely  formal  interpretation  only  when 
it  is  grounded  on  definite  knowledge,  is  rightly  compelled  to 
admit  that  we  cannot  get  on  altogether  without  the  Principle  of 
Non-Sufficient  Reason.  On  the  grounds  both  of  its  own  intuitive 
plausibility  and  of  that  of  some  of  the  conclusions  for  which  it 
is  necessary,  we  are  inevitably  led  towards  this  principle  as  a 
necessary  basis  for  judgments  of  probability.  In  some  sense, 
judgments  of  probability  do  seem  to  be  based  on  equally  balanced 
degrees  of  ignorance. 

8.  It  is  from  this  starting-point  that  the  German  logicians 
have  set  out.  They  have  perceived  that  there  are  few  judgments 
of  probability  which  are  altogether  independent  of  some  principle 
resembling  that  of  Non-Sufficient  Reason.  •  But  they  also  appre 
hend,  with  Boole,  that  this  may  be  a  very  arbitrary  method  of 
procedure. 

It  was  pointed  out  in  §  18  of  Chapter  IV.  that  the  cases,  in 
which  the  Principle  of  Indifference  (or  Non-Sufficient  Reason) 
breaks  down,  have  a  great  deal  in  common,  and  that  we  break 
up  the  field  of  possibility  into  a  number  of  areas,  actually  unequal, 
but  indistinguishable  on  the  evidence.  Several  German  logicians, 
therefore,  have  endeavoured  to  determine  some  rule  by  which 
it  might  be  possible  to  postulate  actual  equality  of  area  for  the 
fields  of  the  various  possibilities. 

By  far  the  most  complete  and  closely  reasoned  solution  on 
these  lines  is  that  of  Von  Kries.1  He  is  primarily  anxious  to  dis 
cover  a  proper  basis  for  the  numerical  measurement  of  probabili 
ties,  and  he  is  thus  led  to  examine  with  care  the  grounds  of  valid 
judgments  of  equiprobabiiity.  His  criticisms  of  the  Principle 
of  Non-Sufficient  Reason  are,  searching,  and,  to  meet  them,  lie 
elaborates  a  number  of  qualifying  conditions  which  are,  he 
argues,  necessary  and  sufficient.  The  value  of  his  book,  however, 
lies,  in  the  opinion  of  the  present  writer,  in  the  critical  rather 
than  in  the  constructive  parts.  The  manner  in  which  his  qualify 
ing  conditions  are  expressed  is  often;  to  an  English  reader  at  any 
rate,  somewhat  obscure,  and  he  seems  sometimes  to  cover  up 
difficulties,  rather  than  solve  them,  by  the  invention  of  new 
technical  terms.  These  characteristics  render  it  difficult  to 
expound  him  adequately  in  a  summary,  and  the  reader  must  be 

1  Die  I'rincipien  <lcr  Wahrscheinlirhkeitsrechnuiig.  Kitir  loyisrhe  I'ntt-r- 
snchitrxj.  Freihnrir,  1880. 
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referred  to  the  original  for  a  proper  exposition  of  the  Doctrine  of 
Spielrdume.  Briefly,  but  not  very  intelligibly  perhaps,  he  may 
be  said  to  hold  that  the  hypotheses  for  the  probabilities  of  which 
we  wish  to  obtain  a  numerical  comparison,  must  refer  to  'fields' 
(Spielrdume)  winch  are  '  indifferent,'  '  comparable  '  in  magnitude, 
and  '  original '  (ursprunglich).  Two  fields  are  '  indifferent '  if 
they  are  equal  before  the  Principle  of  Non-Sufficient  Reason  ; 
they  are  '  comparable  '  if  it  is  true  that  the  fields  are  actually 
of  equal  extent ;  and  they  are  '  original '  or  ultimate  if  they  are 
not  derived  from  some  other  field.  The  last  condition  is  exceed 
ingly  obscure,  but  it  seems  to  mean  that  the  objects  with  which 
we  are  ultimately  dealing  must  be  directly  represented  by  the 
'  fields  '  of  our  hypotheses,  and  there  must  not  be  merely  correla 
tion  between  these  objects  and  the  objects  of  the  fields.  The 
qualification  of  comparability  is  intended  to  deal  with  difficulties 
such  as  that  connected  with  the  population  of  different  areas  of 
unknown  extent ;  and  the  qualification  of  originality  with  those 
arising  from  indirect  measurement,  as  in  the  case  of  specific 
density. 

Von  Kries's  solution  is  highly  suggestive,  but  it  does  not  seem, 
so  far  as  I  understand  it,  to  supply  an  unambiguous  criterion 
for  all  cases.  His  discussion  of  the  philosophical  character  of 
probability  is  brief  and  inadequate,  and  the  fundamental  error 
in  his  treatment  of  the  subject  is  the  physical,  rather  than  logical, 
bias  which  seems  to  direct  the  formulation  of  his  conditions. 
The  condition  of  Ursprunglichkeit,  for  instance,  seems  to  depend 
upon  physical  rather  than  logical  criteria,  and  is,  as  a  result, 
much  more  restricted  in  its  applicability  than  a  condition,  which 
will  really  meet  the  difficulties  of  the  case,  ought  to  be.  But, 
although  I  differ  from  him  in  his  philosophical  conception  of 
probability,  the  treatment  of  the  Principle  of  Indifference,  which 
fills  the  greater  part  of  his  book,  is,  I  think,  along  fruitful  lines, 
and  I  have  been  deeply  indebted  to  it  in  formulating  my  own 
conditions  in  Chapter  IV. 

Of  less  closely  reasoned  and  less  detailed  treatments,  which 
aim  at  the  same  kind  of  result,  those  of  Sigwart  and  Lotze  are 
worth  noticing.  Sigwart's1  position  is  sufficiently  explained  by 
the  following  extract :  "  The  possibility  of  a  mathematical  treat 
ment  lies  primarily  in  the  fact  that  in  the  disjunctive  judgment 

1  Sigvvart,  Logic  (Eng.  edition),  vol.  ii.  p.  220. 
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the  number  of  terms  in  the  disjunction  plays  a  decisive  part. 
Inasmuch  as  a  limited  number  of  mutually  exclusive  possi 
bilities  is  presented,  of  which  one  alone  is  actual,  the  element 
of  number  forms  an  essential  part  of  our  knowledge.  .  .  .  Our 
knowledge  must  enable  us  to  assume  that  the  particular  terms  of 
the  disjunction  are  so  far  equivalent  that  they  express  an  equal 
degree  of  specialisation  of  a  general  concept,  or  that  they  cover 
equal  parts  of  the  whole  extension  of  the  concept.  .  .  .  This 
equivalence  is  most  intuitable  where  we  are  dealing  with  equal 
parts  of  a  spatial  area,  or  equal  parts  of  a  period  of  time.  .  .  . 
But  even  where  this  obvious  quality  is  not  forthcoming,  we  may 
ground  our  expectations  upon  a  hypothetical  equivalence,  where 
we  see  no  reason  for  considering  the  extent  of  one  possibility  to 
be  greater  than  that  of  the  others.  .  .  ." 

In  the  beginning  of  this  passage  Sigwart  seems  to  be  aware 
of  the  fundamental  difficulty,  although  exception  may  be  taken 
to  the  vagueness  of  the  phrase,  "  equal  degree  of  specialisation  of 
a  general  concept."  But  in  the  last  sentence  quoted  he  surrenders 
the  advantages  he  has  gained  in  the  earlier  part  of  his  explana 
tion,  and,  instead  of  insisting  on  a  knowledge  of  an  equal  degree 
of  specialisation,  he  is  satisfied  with  an  absence  of  any  knowledge 
to  the  contrary.  Hence,  in  spite  of  his  initial  qualifications,  he 
ends  unrestrainedly  in  the  arms  of  Non-Sufficient  Reason.1 

Lotze,2  in  a  brief  discussion  of  the  subject,  throws  out  some 
remarks  well  worth  quoting  :  "  We  disclaim  all  knowledge  of 
the  circumstances  which  condition  the  real  issue,  so  that  when 
we  talk  of  equally  possible  cases  we  can  only  mean  coordinated  #.s 
equivalent  xpeciex  in  (he  compass  of  an  universal  case  ;  that  is  to 
say,  if  we  enumerate  the  special  forms,  which  the  genus  can 
as.sinne,  we  get  a  disjunctive  judgment  of  the  form  :  if  the  con 
dition  B  is  fulfilled,  one  of  the  kinds /,/.,/,  ...  of  the  universal 
consequent  F  will  occur  to  the  exclusion  of  the  rest.  Which  of 
all  those  different  consequents  will,  in  fact,  occur,  depends  in  all 
cases  on  the  special  form  bjt.l).,  ...  in  which  that  universal 
condition  is  fulfilled.  ...  A  coordinated  case  is  a  case  which 
answers  to  one  and  only  one  of  the  mutually  exclusive  values 
&,&,,  .  .  .  of  the  condition  B,  and  these  rival  values  may  occur  in 

1  Si^wart's  treatment  of  tho  subject  of  probability  in  curiously  inaccurate. 
Of  hi*  four  fundamental  rule*  of  probability,  for  instance,  threo  are,  as  he  Htato.s 
them,  certainly  falsr-. 

2  Lot/.r,  Lotjir.  (Kiij^.  edition),  pp.  .'{til,  :{iir>. 
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reality  ;  it  does  not  answer  to  a  more  general  form  B,  of  this 
condition,  which  can  never  exist  in  reality,  because  it  embraces 
several  of  the  particular  values  616().  ..." 

This  certainly  meets  some  of  the  difficulties,  and  its  resem 
blance  to  the  conditions  formulated  in  Chapter  IV.  will  be  evident 
to  the  careful  reader.  But  it  is  not  very  precise,  and  not  easily 
applicable  to  all  cases,  to  those,  for  instance,  of  the  measure 
ment  of  continuous  quantity.  By  combining  the  suggestions  of 
Von  Kries,  Sigwart,  and  Lotze,  we  might,  perhaps,  patch  up  a 
fairly  comprehensive  rule.  We  might  say,  for  instance,  that  if 
6j  and  &.,  are  classes,  their  members  must  be  finite  in  number  and 
enumerable  or  they  must  compose  stretches  ;  that,  if  they  are 
finite  in  number,  they  must  be  equal  in  number  ;  and  that,  if 
their  members  compose  stretches,  the  stretches  must  be  equal 
stretches  ;  and  that  if  b{  and  6,  are  concepts,  they  must  represent 
concepts  of  an  equal  degree  of  specialisation.  But  qualifications 
so  worded  would  raise  almost  as  many  difficulties  as  they  solved. 
How,  for  instance,  are  we  to  know  when  concepts  are  of  an  equal 
degree  of  specialisation  ? 

9.  That  probability  is  a  relation  has  often  received  incidental 
recognition  from  logicians,  in  spite  of  the  general  failure  to  place 
proper  emphasis  on  it.  The  earliest  writer,  with  whom  I  am 
acquainted,  explicitly  to  notice  this,  is  Kahle  in  his  Elementa 
logicae  Probdbilium  methodo  mathematica  in  us-um  Scientiarum 
et  Vitae  adornata  published  at  Halle  in  1735. 1  Amongst  more 
recent  writers  casual  statements  are  common  to  the  effect  that 
the  probability  of  a  conclusion  is  relative  to  the  grounds  upon 
which  it  is  based.  Take  Boole  2  for  instance  :  "  It  is  implied  in 
the  definition  that  probability  is  always  relative  to  our  actual 

1  This  work,  which  seems  to  have  soon  fallen  into  complete  neglect  and  is 
now  extremely  rare,  is  full  of  interest  and  original  thought.  The  following 
quotations  will  show  the  fundamental  position  taken  up  :  "  Est  cognitio  pro- 
babilis,  si  desunt  quaedam  requisita  ad  veritatem  demonstrativam  (p.  15). 
Propositio  probabilis  esse  potest  falsa,  et  improbabilis  esse  potest  vera  ;  ergo 
cognitio  hodie  possibilis,  crastina  luce  mutari  potest  improbabilem,  si  accedunt 
reliqua  requisita  omnia,  in  certitudinem  (p.  26).  .  .  .  Certitude  est  terminus 
relativus  :  considerare  potest  rntione  representation um  in  intellectu  nostro. 
.  .  .  Inccrta  nobis  dependent  a  defectu  cognitionis  (p.  35).  .  .  .  Actionem 
imprudenter  et  contra  ivgulas  probabilitatis  susceptam  eventus  felix  sequi 
potest.  Ergo  prudentia  aetionum  ex  successu  solo  non  est  aestimanda  (p.  62). 
.  .  .  Logica  probabilium  est  scientia  dijudicandi  gradum  certitudinis  eorum, 
quibus  desunt  requisita  ad  veritatem  demonstrativam  (p.  94)." 

•  "  On  a  General  Method  in  the  Theory  of  Probabilities,"  Phil  May.,  4th 
Series,  viii.,  1854.  See  also,  "  On  the  Application  of  the  Theory  of  Probabilities 
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state  of  information  and  varies  with  that  state  of  information." 
Or  Bradley  :  *  "  Probability  tells  us  what  we  ought  to  believe, 
what  we  ought  to  believe  on  certain  data  .  .  .  Probability  is  no 
more  '  relative '  and  '  subjective '  than  is  any  other  act  of 
logical  inference  from  hypothetical  premises.  It  is  relative  to 
the  data  with  which  it  has  to  deal,  and  is  not  relative  in  any  other 
sense."  Or  even  Laplace,  when  he  is  explaining  the  diversity 
of  human  opinions  :  "  Dans  les  choses  qui  ne  sont  que  vraisem- 
blables,  la  difference  des  donnees  que  chaque  homnie  a  sur  elles. 
est  une  des  causes  principales  de  la  diversite  des  opinions  que 
Ton  voit  regner  sur  les  memes  objets  .  .  .  c'est  ainsi  que  le 
m erne  fait,  recite  devant  une  nombreuse  assemblee,  obtient  divers 
degres  de  croyance,  suivant  1'etendue  des  connaissances  des 
auditeurs."  2 

10.  Here  we  may  leave  this  account  of  the  various  directions 
in  which  progress  has  seemed  possible,  with  the  hope  that  it  may 
assist  the  reader,  who  is  dissatisfied  with  the  solution  proposed  in 
Chapter  IV.,  to  determine  the  line  of  argument  along  which  he 
is  likeliest  to  discover  the  solution  of  a  difficult  problem. 


to  the  Question  of  the  Combination  of  Testimonies  or  Judgments  "  (Edin.  Phil. 
Trans,  xxi.  p.  GOO)  :  "  Our  estimate  of  the  probability  of  an  event  varies  not 
absolutelv  with  the  circumstances  which  actually  a  fleet  its  occurrence,  but  with 
our  knowledge  of  those  circumstances." 

1  The  Principle*  <>f  Loijir,  p.  I'HS. 

*  Essai  philosophiqitr,  p.  7. 


CHAPTER    VIII 

THE  FREQUENCY  THEORY  OF  PROBABILITY 

1.  THE  theory  of  probability,  outlined  in  the  preceding  chapters, 
has  serious  difficulties  to  overcome.  There  is  a  theoretical,  as 
well  as  a  practical,  difficulty  in  measuring  or  comparing  degrees 
of  probability,  and  a  further  difficulty  in  determining  them 
d  priori.  We  must  now  examine  an  alternative  theory  which  is 
much  freer  from  these  troubles,  and  is  widely  held  at  the  present 
time. 

2.  The  theory  is  in  its  essence  a  very  old  one.  Aristotle 
foreshadowed  it  when  he  held  that  "  the  probable  is  that  which 
for  the  most  part  happens  "  ;  1  and,  as  we  have  seen  in  Chapter 
VII.,  an  opinion  not  unlike  this  was  entertained  by  those  philoso 
phers  of  the  seventeenth  and  eighteenth  centuries  who  approached 
the  problems  of  probability  uninfluenced  by  the  work  of  mathe 
maticians.  But  the  underlying  conception  of  earlier  writers 
received  at  the  hands  of  some  English  logicians  during  the  latter 
half  of  the  nineteenth  century  a  new  and  much  more  complicated 
form. 

The  theory  in  question,  which  I  shall  call  the  Frequency 
Theory  of  Probability,  first  appears  2  as  the  basis  of  a  proposed 
logical  scheme  in  a  brief  essay  by  Leslie  Ellis  On  the  Foundations 
of  the  Theory  of  Probabilities,  and  is  somewhat  further  developed 
in  his  Remarks  on  the  Fundamental  Principles  of  the  Theory  of 

1  Ehet.  i.  2,  1357  a  34. 

2  I  give  .Kills  the  priority  because  his  paper,  published  in  1843,  was  read  on 
Feb.   14,  1842.     The  same  conception,  however,  is  to  be  found  in  Cournot's 
Exposition,  also  published  in  1843  :   "  La  theorie  des  probabilites  a  pour  objet 
certains  rapports  numeriques  qui  prendraient  des  valeurs  fixes  et  completement 
determinees,  si  Ton  pouvait  repeter  a  1'infini  les  epreuves  des  memes  hasards, 
et  qui,  pour  un  nombre  fini  d'epreuvos,  oscillent  entre  des  limites  d'autant  plus 
resserrees,  d'autant  plus  voisines  des  valeurs  finales,  quo  le  nombre  des  epreuves 
est  plus  grand." 

92 
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Probabilities.1  "  Jf  the  probability  of  a  given  event  be  correctly 
determined,"  he  says,  "the  event  will  on  a  long  run  of  trials  tend 
to  recur  with  frequency  proportional  to  their  probability.  This 
is  generally  proved  mathematically.  It  seems  to  me  to  be  true 
d  priori.  ...  1  have  been  unable  to  sever  the  judgment  that 
one  event  is  more  likely  to  happen  than  another  from  the  belief 
that  in  the  long  run  it  will  occur  more  frequently."  Ellis  ex 
plicitly  introduces  the  conception  that  probability  is  essentially 
concerned  with  a  group  or  series. 

Although  the  priority  of  invention  must  be  allowed  to  Leslie 
Ellis,  the  theory  is  commonly  associated  with  the  name  of  Venn. 
In  his  Logic  of  Chance  2  it  first  received  elaborate  and  systematic 
treatment,  and,  in  spite  of  his  having  attracted  a  number  of 
followers,  there  has  been  no  other  comprehensive  attempt  to 
meet  the  theory's  special  difficulties  or  the  criticisms  directed 
against  it.  I  shall  begin,  therefore,  by  examining  it  in  the  form 
in  which  Venn  has  expounded  it.  Venn's  exposition  is  much 
coloured  by  an  empirical  view  of  logic,  which  is  not  perhaps  as 
necessary  to  the  essential  part  of  his  doctrine  as  lie  himself 
implies,  and  is  not  shared  by  all  of  those  who  must  be  classed  as 
in  general  agreement  with  him  about  probability.  It  will  be 
necessary,  therefore,  to  supplement  a  criticism  of  Venn  by  an 
account  of  a  more  general  frequency  theory  of  probability, 
divested  of  the  empiricism  with  which  he  has  clothed  it. 

3.  The  following  quotations  from  Venn's  Logic  of  Chance  will 
show  the  general  drift  of  his  argument  :  The  fundamental  con 
ception  is  that  of  a  series  (p.  4).  The  series  is  of  events  which 
have  a  certain  number  of  features  or  attributes  in  common  (p.  10). 
The  characteristic  distinctive  of  probability  is  this,-  the  occa 
sional  attributes,  as  distinguished  from  the  permanent,  are  found 
on  an  examination  to  tend  to  exist  in  a  certain  definite  proportion 
of  the  whole  number  of  cases  (p.  1 1 ).  We  require  that  there  should 
be  in  nature  large  classes  of  objects,  throughout  all  the  individual 
members  of  which  a  general  resemblance  extends.  For  this 

1  These  essays  were  published  in  the  Transactions  of  the  Onmb.  Phil.  Sr 
first  in  1843  (vol.  viii.),  and  the  second  in  1H,">4  (vol.  ix.).      Both  were  rej 
in    Mathematical  and  other    Writing*   (I8(i!i),    together  with    three   othe 
papers  on  Probability  and  the  Method  of  Least  Squares.     All  five  are 
spirit  and  originality,  and  are  not  now  so  well  known  as  they  deserve  t 

2  The  first  edition  appeared  in  I8<>(i.     Revised  editions  were  issued  i 
and  1SSS.      References  are  given  to  the  third  edition  of  1888. 
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purpose  the  existence  of  natural  kinds  or  groups  is  necessary 
(p.  55).  The  distinctive  characteristics  of  probability  prevail 
principally  in  the  properties  of  natural  kinds,  both  in  the  ultimate 
and  in  the  derivative  or  accidental  properties  (p.  63).  The  same 
peculiarity  prevails  again  in  the  force  and  frequency  of  most 
natural  agencies  (p.  64).  There  seems  reason  to  believe  that  it 
is  in  such  things  only,  as  distinguished  from  things  artificial,  that 
the  property  in  question  is  to  be  found  (p.  05).  How,  in  any 
particular  case,  are  we  to  establish  the  existence  of  a  probability 
series  ?  Experience  is  our  sole  guide.  If  we  want  to  discover 
what  is  in  reality  a  series  of  things,  not  a  series  of  our  own  con 
ceptions,  wre  must  appeal  to  the  things  themselves  to  obtain  it, 
for  we  cannot  find  much  help  elsewhere  (p.  174).  When  proba 
bility  is  divorced  from  direct  reference  to  objects,  as  it  substanti 
ally  is  by  not  being  founded  upon  experience,  it  simply  resolves 
itself  into  the  common  algebraical  doctrine  of  Permutations 
and  Combinations  (p.  87).  By  assigning  an  expectation  in 
reference  to  the  individual,  we  mean  nothing  more  than  to  make 
a  statement  about  the  average  of  his  class  (p.  151).  When  we  say 
of  a  conclusion  within  the  strict  province  of  probability,  that  it 
is  not  certain,  all  that  we  mean  is  that  in  some  proportion  of 
cases  only  will  such  conclusion  be  right,  in  the  other  cases  it  will 
be  wrong  (p.  210). 

The  essence  of  this  theory  can  be  expressed  in  a  few  words. 
To  say,  that  the  probability  of  an  event's  having  a  certain  charac 
teristic  is  x,  is  to  mean  that  the  event  is  one  of  a  number  of  events, 
a  proportion  of  which  have  the  characteristic  in  question  ;  and 
the  fact,  that  there  is  such  a  series  of  events  possessing  this 
frequency  in  respect  of  the  characteristic,  is  purely  a  matter  of 
experience  to  be  determined  in  the  same  manner  as  any  other 
question  of  fact.  That  such  series  do  exist  happens  to  be  a 
characteristic  of  the  real  world  as  we  know  it,  and  from  this 
the  practical  importance  of  the  calculation  of  probabilities  is 
derived. 

Such  a  theory  possesses  manifest  advantages.  There  is  no 
mystery  about  it — no  new  indefinables,  no  appeals  to  intuition. 
Measurement  leads  to  no  difficulties  ;  our  probabilities  or  fre 
quencies  are  ordinary  numbers,  upon  which  the  arithmetical 
apparatus  can  be  safely  brought  to  bear.  And  at  the  same  time  it 


CH.  viii  FUNDAMENTAL  IDEAS  95 

seems  to  crystallise  in  a  clear,  explicit  shape  the  floating  opinion 
of  common  sense  that  an  event  is  or  is  not  probable  in  certain 
supposed  circumstances  according  as  it  is  or  is  not  usual  as  a 
matter  of  fact  and  experience. 

The  two  principal  tenets,  then,  of  Venn's  system  are  these, — 
that  probability  is  concerned  with  series  or  groups  of  events, 
and  that  all  the  requisite  facts  must  be  determined  empirically, 
a  statement  in  probability  merely  summing  up  in  a  convenient 
way  a  group  of  experiences.  Aggregate  regularity  combined 
with  individual  difference  happens,  he  says,  to  be  characteristic 
of  many  events  in  the  real  world.  It  will  often  be  the  case, 
therefore,  that  we  can  make  statements  regarding  the  average 
of  a  certain  class,  or  regarding  its  characteristics  in  the  long  run, 
which  we  cannot  make  about  any  of  its  individual  members 
without  great  risk  of  error.  As  our  knowledge  regarding  the 
class  as  a  whole  may  give  us  valuable  guidance  in  dealing  with  an 
individual  instance,  we  require  a  convenient  way  of  saying  that 
an  individual  belongs  to  a  class  in  which  certain  characteristics 
appear  on  the  average  with  a  known  frequency  ;  and  this  the 
conventional  language  of  probability  gives  us.  The  importance 
of  probability  depends  solely  upon  the  actual  existence  of  such 
groups  or  real  kinds  in  the  world  of  experience,  and  a  judgment 
of  probability  must  necessarily  depend  for  its  validity  upon  our 
empirical  knowledge  of  them. 

4.  It  is  the  obvious,  as  well  as  the  correct,  criticism  of  such  a 
theory,  that  the  identification  of  probability  with  statistical 
frequency  is  a  very  grave  departure  from  the  established  use  of 
words  ;  for  it  clearly  excludes  a  great  number  of  judgments 
which  are  generally  believed  to  deal  with  probability.  Venn 
himself  was  well  aware  of  this,  and  cannot  be  accused  of  supposing 
that  all  beliefs,  which  are  commonly  called  probable,  are  really 
concerned  with  statistical  frequency.  But  some  of  his  followers, 
to  judge  from  their  published  work,  have  not  always  seen,  so 
clearly  as  he  did,  that  his  theory  is  not  concerned  with  the  same 
subject  as  that  with  which  other  writers  have  dealt  under  the 
same  title.  Venn  justifies  his  procedure  by  arguing  that  no  other 
meaning,  of  which  it  is  possible  to  take  strict  logical  cognisance, 
can  reasonably  be  given  to  the  term,  and  that  the  other  meanings, 
with  which  it  has  been  used,  have  not  enough  in  common  to 
permit  their  reduction  to  a  single  logical  scheme.  It  is  useless, 
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therefore,  for  a  critic  of  Venn  to  point  out  that  many  supposed 
judgments  of  probability  are  not  concerned  with  statistical 
frequency  ;  for,  as  I  understand  the  Logic  of  Chance,  he  admits 
it ;  and  the  critic  must  show  that  the  sense  different  from  Venn's 
in  which  the  term  probability  is  often  employed  has  an  important 
logical  interpretation  about  which  we  can  generalise.  This 
position  I  seek  to  establish.  It  is,  in  my  opinion,  this  other  sense 
alone  which  has  importance  ;  Venn's  theory  by  itself  has  few 
practical  applications,  and  if  we  allow  it  to  hold  the  field,  we  must 
admit  that  probability  is  not  the  guide  of  life,  and  that  in  following 
it  we  are  not  acting  according  to  reason. 

5.  Part  of  the  plausibility  of  Venn's  theory  is  derived,  I 
think,  from  a  failure  to  recognise  the  narrow  limits  of  its  ap 
plicability,  or  to  notice  his  own  admissions  regarding  this.  "  In 
every  case,"  he  says  (p.  124),  "  in  which  we  extend  our  inferences 
by  Induction  or  Analogy,  or  depend  upon  the  witness  of  others, 
or  trust  to  our  own  memory  of  the  past,  or  come  to  a  conclusion 
through  conflicting  arguments,  or  even  make  a  long  and  com 
plicated  deduction  by  mathematics  or  logic,  we  have  a  result  of 
which  we  can  scarcely  feel  as  certain  as  of  the  premisses  from 
which  it  was  obtained.  In  all  these  cases,  then,  we  are  conscious 
of  varying  quantities  of  belief,  but  are  the  laws  according  to  which 
the  belief  is  produced  and  varied  the  same  ?  If  they  cannot  be 
reduced  to  one  harmonious  scheme,  if,  in  fact,  they  can  at  best  be 
brought  to  nothing  but  a  number  of  different  schemes,  each  with 
its  own  body  of  laws  and  rules,  then  it  is  vain  to  endeavour  to 
force  them  into  one  science."  All  these  cases,  therefore,  in  which 
we  are  '  not  certain,'  Venn  explicitly  excludes  from  what  he 
chooses  to  call  the  science  of  probability,  and  he  pays  no  further 
attention  to  them.  The  science  of  probability  is,  according  to 
him,  no  more  than  a  method  which  enables  us  to  express  in  a 
convenient  form  statistical  statements  of  frequency.  "  The 
province  of  probability,"  he  says  again  on  page  160,  "  is  not  so 
extensive  as  that  over  which  variation  of  belief  might  be  observed. 
Probability  only  considers  the  case  in  which  this  variation  is 
brought  about  in  a  certain  definite  statistical  way."1  He  points 

1  Edgeworth  uses  the  term  '  probability  '  widely,  as  I  do  ;  but  he  makes 
a  distinction  corresponding  to  Venn's  by  limiting  the  subject-matter  of  the 
Calculus  of  Probabilities.  He  writes  ('Philosophy  of  Chance,'  Hind,  1884, 
p.  223)  :  "The  Calculus  of  Probabilities  is  concerned  with  the  estimation  of 
degrees  of  probability  ;  not  every  species  of  estimate,  but  that  which  is  founded 
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out  on  p.  194  that  for  the  purposes  of  probability  we  must  take 
the  statistical  frequency  from  which  we  start  ready  made  and 
ask  no  questions  about  the  process  or  completeness  of  its  manu 
facture  :  "  It  may  be  obtained  by  any  of  the  numerous  rules 
furnished  by  Induction,  or  it  may  be  inferred  deductively,  or 
given  by  our  own  observation  ;  its  value  may  be  diminished  by 
its  depending  upon  the  testimony  of  witnesses,  or  its  being 
recalled  by  our  own  memory.  Its  real  value  may  be  influenced 
by  these  causes  or  any  combinations  of  them  ;  but  all  these  are 
preliminary  questions  with  which  we  have  nothing  directly  to  do. 
We  assume  our  statistical  proposition  to  be  true,  neglecting  the 
diminution  of  its  value  by  the  processes  of  attainment." 

It  must  be  recognised,  therefore,  that  Venn  has  deliberately 
excluded  from  his  survey  almost  all  the  cases  in  which  we  regard 
our  judgments  as  '  only  probable  '  ;  and,  whatever  the  value  or 
consistency  of  his  own  scheme,  he  has  left  untouched  a  wide 
field  of  study  for  others. 

6.  The  main  grounds,  which  have  induced  Venn  to  regard 
judgments  based  on  statistical  frequency  as  the  only  cases  of 
probability  which  possess  logical  importance,  seem  to  be  two  : 
(i.)  that  other  cases  are  mainly  subjective,  and  (ii.)  that  they 
are  incapable  of  accurate  measurement. 

With  regard  to  the  first  it  must  be  admitted  that  there  are 
many  instances  in  which  variation  of  belief  is  occasioned  by  purely 
psychological  causes,  and  that  his  argument  is  valid  against  those 
who  have  defined  probability  as  measuring  the  degree  of  sub 
jective  belief.  But  this  lias  not  been  the  usual  way  of 
looking  at  the  subject.  Probability  is  the  study  of  the 
grounds  which  lead  us  to  entertain  a  rational  preference  for 
one  belief  over  another.  That  there  are  rational  grounds  other 
than  statistical  frequency,  for  such  preferences,  Venn  does 
not  deny  ;  IK;  admits  in  the  quotation  given  above  that  the 
'  real  value  '  of  our  conclusion  is  influenced  by  many  other  con- 
on  a  particular  standard.  That  standard  is  tho  phenomenon  of  statistical 
uniformity  :  tho  fact  that  a  genus  ran  very  frequently  be  subdivided  into  species 
such  that  the  number  of  individuals  in  each  species  bears  an  approximately 
constant  ratio  to  the  number  of  individuals  in  tho  genus."  This  use  of  terms  is 
legitimate,  though  it  is  not  easy  to  follow  it  consistently.  But,  like  Venn's, 
it  leaves  aside  the  most  important  questions.  Tho  Calculus  of  Probabili 
ties,  thus  interpreted,  is  no  guide  by  itself  as  to  which  opinion  we  ought 
to  follow,  and  \A  not  a  measure  of  the  weight  we  should  attach  to  conflicting 
arguments. 
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siderations  than  that  of  statistical  frequency.  Venn's  theory, 
therefore,  cannot  be  fairly  propounded  by  his  disciples  as  alterna 
tive  to  such  a  theory  as  is  propounded  here.  For  my  Treatise  is 
concerned  with  the  general  theory  of  arguments  from  premisses 
leading  to  conclusions  which  are  reasonable  but  not  certain  ; 
and  this  is  a  subject  which  Venn  has,  deliberately,  not  treated 
in  the  Logic  of  Chance. 

7.  Apart  from  two  circumstances,  it  would  scarcely  be  neces 
sary  to  say  anything  further  ;  but  in  the  first  place  some  writers 
have  believed  that  Venn  has  propounded  a  complete  theory 
of  probability,  failing  to  realise  that  he  is  not  at  all  concerned 
with  the  sense  in  which  we  may  say  that  one  induction  or  analogy, 
or  testimony,  or  memory,  or  train  of  argument  is  more  probable 
than  another  ;  and  in  the  second  place  he  himself  has  not  always 
kept  within  the  narrow  limits,  which  he  has  himself  laid  down 
as  proper  to  his  theory. 

For  he  has  not  remained  content  with  defining  a  probability 
as  identical  with  a  statistical  frequency,  but  has  often  spoken 
as  if  his  theory  told  us  which  alternatives  it  is  reasonable  to  prefer. 
When  he  states,  for  instance,  that  modality  ought  to  be  banished 
from  Logic  and  relegated  to  Probability  (p.  296),  he  forgets  his 
own  dictum  that  of  premisses,  the  distinctive  characteristic  of 
which  is  their  lack  of  certainty,  Probability  takes  account  of 
one  class  only,  Induction  concerning  itself  with  another  class,  and 
so  forth  (p.  321).  He  forgets  also  that,  when  he  comes  to  consider 
the  practical  use  of  statistical  frequencies,  he  has  to  admit  that 
an  event  may  possess  more  than  one  frequency,  and  that  we  must 
decide  which  of  these  to  prefer  on  extraneous  grounds  (p.  213). 
The  device,  he  says,  must  be  to  a  great  extent  arbitrary,  and  there 
are  no  logical  grounds  of  decision  ;  but  would  he  deny  that  it  is 
often  reasonable  to  found  our  probability  on  one  statistical 
frequency  rather  than  on  another  ?  And  if  our  grounds  are 
reasonable,  are  they  not  in  an  important  sense  logical  ? 

Even  in  those  cases,  therefore,  in  which  we  derive  our  prefer 
ence  for  one  alternative  over  another  from  a  knowledge  of  statis-' 
tical  frequencies,  a  statistical  frequency  by  itself  is  insufficient 
to  determine  us.  We  may  call  a  statistical  frequency  a  prob 
ability,  if  we  choose  ;  but  the  fundamental  problem  of  determining 
which  of  several  alternatives  is  logically  preferable  still  awaits 
solution.  We  cannot  be  content  with  the  only  counsel  Venn 
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can  offer,  that  we  should  choose  a  frequency  which  is  derived 
from  a  series  neither  too  large  nor  too  small. 

The  same  difficulty,  that  a  probability  in  Venn's  sense  is 
insufficient  to  determine  which  alternative  is  logically  preferable, 
arises  in  another  connection.  In  most  cases  the  statistical 
frequency  is  not  given  in  experience  for  certain,  but  is  arrived 
at  by  a  process  of  induction,  and  inductions,  he  admits,  are  not 
certain.  If,  in  the  past,  three  infants  out  of  every  ten  have 
died  in  their  first  four  years,  induction  may  base  on  this  the 
doubtful  assertion,  All  infants  die  in  that  proportion.  But  we 
cannot  assert  on  this  ground,  as  Venn  wishes  to  do,  that  the  prob 
ability  of  the  death  of  an  infant  in  its  first  four  years  is  ,30ths. 
We  can  say  no  more  than  that  it  is  probable  (in  my  sense)  that 
there  is  such  a  probability  (in  his  sense).  For  the  purpose  of 
coming  to  a  decision  we  cannot  compare  the  value  of  this 
conclusion  with  that  of  others  until  we  know  the  probability 
(in  my  sense)  that  the  statistical  frequency  really  is  ,n(,ths. 
The  cases  in  which  we  can  determine  the  logical  value  of  a 
conclusion  entirely  on  grounds  of  statistical  frequency  would 
seem  to  be  extremely  few  in  number. 

8.  The  second  main  reason  which  led  Venn  to  develop  his 
theory  is  to  be  found  in  his  belief  that  probabilities  which  are 
based  on  statistical  frequencies  are  alone  capable  of  accurate 
measurement.  The  term  '  probabilities,'  he  argues,  is  properly 
confined  to  the  case  of  chances  which  can  be  calculated,  and  all 
calculable  chances  can  be  made  to  depend  upon  statistical 
frequency.  In  attempting  to  establish  this  latter  contention 
he  is  involved  in  some  paradoxical  opinions.  "  In  many  cases," 
he  admits,  "  it  is  undoubtedly  true  that  we  do  not  resort  to  direct 
experience  at  all.  If  I  want  to  know  what  is  my  chance  of 
holding  ten  trumps  in  a  game  of  whist,  I  do  not  enquire  how 
often  such  a  tiling  has  occurred  before.  ...  In  practice,  d  priori 
determination  is  often  easy,  whilst  d  posteriori  appeal  to  experi 
ence  would  be  not  merely  tedious  but  utterly  impracticable." 
But  these  cases  which  are  usually  based  on  the  Principle  of 
Indifference  can,  he  maintains,  be  justified  on  statistical  grounds. 
In  the  case  of  coin  tossing  there  is  a  considerable  experience  of 
the  equally  frequent  occurrence  of  heads  and  tails  ;  the  experi 
ence  gained  in  this  simple  case  is  to  be  extended  to  the  complex 
cases  by  "  Induction  and  Analogy."  In  one  simple  case  the 
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result  to  which  the  Principle  of  Indifference  would  lead  is  that 
which  experience  recommends.  Therefore  in  complex  cases, 
where  there  is  no  basis  of  experiment  at  all,  we  may  assume  that 
Experience,  if  experience  there  was,  would  speak  with  the  same 
voice  as  Indifference.  This  is  to  assert  that,  because  in  one  case, 
where  there  is  no  known  reason  to  the  contrary,  there  actually 
is  none,  therefore  in  other  cases  incapable  of  verification  the 
absence  of  known  reason  to  the  contrary  proves  that  actually 
there  is  none. 

The  attempt  to  justify  the  rules  of  inverse  probability  on 
statistical  grounds  I  have  failed  to  understand  ;  and  after  a  care 
ful  reading,  I  am  unable  to  produce  an  intelligible  account  of 
the  argument  involved  in  the  latter  part  of  chapter  vii.  of  the 
Logic  of  Chancel  I  am  doubtful  whether  Venn  should  not  have 
excluded  a  posteriori  arguments  in  probability  from  his  scheme 
as  well  as  inductive  arguments.  The  attempt  to  include  them 
may  have  been  induced  by  a  desire  to  deal  with  all  cases 
in  which  numerical  calculation  has  been  commonly  thought 
possible. 

9.  The  argument  so  far  has  been  solely  concerned  with  the 
case  for  the  frequency  theory  developed  in  the  Logic  of  Chance. 
The  criticisms  which  follow  will  be  directed  against  a  more 
general  form  of  the  same  theory  which  may  conceivably  have 
recommended  itself  to  some  readers.  It  is  unfortunate  that  no 
adherent  of  the  doctrine,  with  the  exception  of  Venn,  has  at 
tempted  to  present  the  theory  of  it  in  detail.  Professor  Karl 
Pearson,  for  instance,  probably  agrees  with  Venn  in  a  general 
way  only,  and  it  is  very  likely  that  many  of  the  foregoing  remarks 
do  not  apply  to  his  view  of  probability  ;  but  while  I  generally 
disagree  with  the  fundamental  premisses  upon  which  his  work 
in  probability  and  statistics  seems  to  rest,  I  am  not  clearly 
aware  of  the  nature  of  the  philosophical  theory  from  which  he 
thinks  that  he  derives  them  and  which  makes  them  appear  to 
him  to  be  satisfactory.  A  careful  exposition  of  his  logical  pre 
suppositions  would  greatly  add  to  the  completeness  of  his  work. 
In  the  meantime  it  is  only  possible  to  raise  general  objections  to 

1  Let  the  reader,  who  is  acquainted  with  this  chapter,  consider  what  precise 
assumption  Venn's  reasoning  requires  on  p.  187  in  the  example  which  seeks  to 
show  the  efficacy  of  Lord  Lister's  antiseptic  treatment  a  posteriori.  What  is 
th^  'inevitable  assumption  about  the  bags'  when  it  is  translated  into  the 
language  of  this  example  ? 
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any  theory  of  probability  which  seeks  to  found  itself  upon  the 
conception  of  statistical  frequency. 

The  generalised  frequency  theory  which  I  propose  to  put 
forward,  as  perhaps  representative  of  what  adherents  of  this 
doctrine  have  in  mind,  differs  from  Venn's  in  several  important 
respects.1  In  the  first  place,  it  does  not  regard  probability  as 
being  identical  with  statistical  frequency,  although  it  holds  that 
all  probabilities  must  be  based  on  statements  of  frequency,  and 
can  be  defined  in  terms  of  them.  It  accepts  the  theory  that 
propositions  rather  than  events  should  be  taken  as  the  subject- 
matter  of  probability ;  and  it  adopts  the  comprehensive  view 
of  the  subject  according  to  which  it  includes  induction  and  all 
other  cases  in  which  we  believe  that  there  are  logical  grounds  for 
preferring  one  alternative  out  of  a  set  none  of  which  are  certain. 
Nor  does  it  follow  Venn  in  supposing  any  special  connection  to 
exist  between  a  frequency  theory  of  probability  and  logical 
empiricism. 

10.  A  proposition  can  be  a  member  of  many  distinct  classes 
of  propositions,  the  classes  being  merely  constituted  by  the 
existence  of  particular  resemblances  between  their  members 
or  in  some  such  way.  We  may  know  of  a  given  proposition  that 
it  is  one  of  a  particular  class  of  propositions,  and  we  may  also 
know,  precisely  or  within  defined  limits,  what  proportion  of  this 
class  are  true,  without  our  being  aware  whether  or  not  the  given 
proposition  is  true.  Let  us,  therefore,  call  the  actual  proportion 
of  true  propositions  in  a  class  the  truth-frequency  2  of  the  class, 
and  define  the  measure  of  the  probability  of  a  proposition  relative 
to  a  class,  of  which  it  is  a  member,  as  being  equal  to  the  truth- 
frequency  of  the  class. 

The  fundamental  tenet  of  a  frequency  theory  of  probability 
is,  then,  that  the  probability  of  a  proposition  always  depends 
upon  referring  it  to  some  class  whose  truth-frequency  is  known 
within  wide  or  narrow  limits. 

Such  a  theory  possesses  most  of  the  advantages  of  Venn's, 
but  escapes  his  narrowness.  There  is  nothing  in  it  so  far  which 
could  not  be  easily  expressed  with  complete  precision  in  the 
terms  of  ordinary  logic.  Nor  is  it  necessarily  confined  to  prob- 

1  In  wliut  followH  I  arn  much  indebted  for  some  suggestions  in  favour  of  the 
frequency  theory  communicated  to  mo  by  Dr.  Whitehead  ;  but  it  is  not  to  bo 
supposed  that  the  exposition  which  follows  represents  his  own  opinion. 

1  This  is  Dr.  Whitehead's  phrase. 
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abilities  which  are  numerical.  In  some  cases  we  may  know  the 
exact  number  which  expresses  the  truth-frequency  of  our  class  ; 
but  a  less  precise  knowledge  is  not  without  value,  and  we  may 
say  that  one  probability  is  greater  than  another,  without  knowing 
how  much  greater,  and  that  it  is  large  or  small  or  negligible,  if 
we  have  knowledge  of  corresponding  accuracy  about  the  truth- 
frequencies  of  the  classes  to  which  the  probabilities  refer.  The 
magnitudes  of  some  pairs  of  probabilities  we  shall  be  able  to 
compare  numerically,  others  in  respect  of  more  and  less  only, 
and  others  not  at  all.  A  great  deal,  therefore,  of  what  has  been 
said  in  Chapter  III.  would  apply  equally  to  the  present  theory, 
with  this  difference  that  the  probabilities  would,  as  a  matter  of 
fact,  have  numerical  values  in  all  cases,  and  the  less  complete 
comparisons  would  only  hold  the  field  in  cases  where  the  real 
probabilities  were  partially  unknown.  On  the  frequency  theory, 
therefore,  there  is  an  important  sense  in  which  probabilities  can 
be  unknown,  and  the  relative  vagueness  of  the  probabilities 
employed  in  ordinary  reasoning  is  explained  as  belonging  not 
to  the  probabilities  themselves  but  only  to  our  knowledge  of 
them.  For  the  probabilities  are  relative,  not  to  our  knowledge, 
but  to  some  objective  class,  possessing  a  perfectly  definite  truth- 
frequency,  to  which  we  have  chosen  to  refer  them. 

The  frequency  theory  expounded  in  this  manner  cannot  easily 
avoid  mention  of  the  relativity  of  probabilities  which  is  implicit 
here,  as  it  is  in  Venn's.  Whether  or  not  the  probability  of  a 
proposition  is  relative  to  given  data,  it  is  clearly  relative  to  the 
particular  class  or  series  to  which  we  choose  to  refer  it.  A  given 
proposition  has  a  great  variety  of  different  probabilities  corre 
sponding  to  each  of  the  various  distinct  classes  of  which  it  is  a 
member  ;  and  before  an  intelligible  meaning  can  be  given  to  a 
statement  that  the  probability  of  a  proposition  is  so-and-so,  the 
class  must  be  specified  to  which  the  proposition  is  being  referred. 
Most  adherents  of  the  frequency  theory  would  probably  go 
further,  and  agree  that  the  class  of  reference  must  be  determined 
in  any  particular  case  by  the  data  at  our  disposal.  Here,  then, 
is  another  point  on  which  it  is  not  necessary  for  the  frequency 
theory  to  diverge  from  the  theory  of  this  Treatise.  It  should, 
I  think,  be  generally  agreed  by  every  school  of  thought  that  the 
probability  of  a  conclusion  is  in  an  important  sense  relative  to 
given  premisses.  On  this  issue  and  also  on  the  point  that  our 
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knowledge-  of  many  probabilities  is  not  numerically  definite, 
there  might  well  be  for  the  future  an  end  of  disagreement,  and 
disputation  might  be  reserved  for  the  philosophical  interpretation 
of  these  settled  facts,  which  it  is  unreasonable  to  deny,  however 
we  may  explain  them. 

11.  I  now  proceed  to  those  contentions  upon  which  my 
fundamental  criticism  of  the  frequency  theory  is  founded.  The 
first  of  these  relates  to  the  method  by  which  the  class  of  reference 
is  to  be  determined.  The  magnitude  of  a  probability  is  always 
to  be  measured  by  the  truth-frequency  of  some  class  ;  and  this 
class,  it  is  allowed,  must  be  determined  by  reference  to  the 
premisses,  on  which  the  probability  of  the  conclusion  is  to  be 
determined.  But,  as  a  given  proposition  belongs  to  innumerable 
different  classes,  how  are  we  to  know  which  class  the  premisses 
indicate  as  appropriate  ?  What  substitute  has  the  frequency 
theory  to  offer  for  judgments  of  relevance  and  indifference  ? 
And  without  something  of  this  kind,  what  principle  is  there  for 
uniquely  determining  the  class,  the  truth-frequency  of  which  is 
to  measure  the  probability  of  the  argument  ?  Indeed  the 
difficulties  of  showing  how  given  premisses  determine  the  class 
of  reference,  by  means  of  rules  expressed  in  terms  of  previous 
ideas,  and  without  the  introduction  of  any  notion,  which  is  new 
and  peculiar  to  probability,  appear  to  me  insurmountable. 

Whilst  no  general  criterion  of  choice  seems  to  exist,  where  of 
two  alternative  classes  neither  includes  the  other,  it  might  be 
thought  that  where  one  does  include  the  other,  the  obvious 
course  would  be  to  take  the  narrowest  and  most  specialised  class. 
This  procedure  was  examined  and  rejected  by  Venn  :  though  the 
objection  to  it  is  due,  not,  as  he  supposed,  to  the  lack  of  sullicient 
statistics  in  such  cases  upon  which  to  found  a  generalisation, 
but  to  the  inclusion  in  the  class-concept  of  marks  characteristic 
of  the  proposition  in  question,  but  nevertheless  not  relevant 
to  the  matter  in  hand.  If  the  process  of  narrowing  the  class 
were  to  be  carried  to  its  furthest  point,  we  should  generally  be 
left  with  a  class  whose  only  member  is  the  proposition  in  question, 
for  we  generally  know  something  about  it  which  is  true  of  no 
other  proposition.  We  cannot,  therefore,  deiine  the  class  of 
reference  as  being  the  class  of  propositions  of  wliicli  everything 
is  true  which  is  known  to  be  true  of  the  proposition  whose  prob 
ability  we  seek  to  determine.  And,  indeed,  in  those  examples 
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for  which  the  frequency  theory  possesses  the  greatest  prima  facie 
plausibility,  the  class  of  reference  is  selected  by  taking  account 
of  some  only  of  the  known  characteristics  of  the  quaesitum,  those 
characteristics,  namely,  which  are  relevant  in  the  circumstances. 
In  those  cases  in  which  one  can  admit  that  the  probability  can  be 
measured  by  reference  to  a  known  truth-frequency,  the  class  of 
reference  is  formed  of  propositions  about  which  our  relevant 
knowledge  is  the  same  as  about  the  proposition  under  considera 
tion.  In  these  special  cases  we  get  the  same  result  from  the 
frequency  theory  as  from  the  Principle  of  Indifference.  But 
this  does  not  serve  to  rehabilitate  the  frequency  theory  as  a 
general  explanation  of  probability,  and  goes  rather  to  show  that 
the  theory  of  this  Treatise  is  the  generalised  theory,  compre 
hending  within  it  such  applications  of  the  idea  of  statistical  truth- 
frequency  as  have  validity. 

'  Relevance  '  is  an  important  term  in  probability,  of  which 
the  meaning  is  readily  intelligible.  I  have  given  my  own  defini 
tion  of  it  already.  But  I  do  not  know  how  it  is  to  be  explained 
in  terms  of  the  frequency  theory.  Whether  supporters  of  this 
theory  have  fully  appreciated  the  difficulty  I  much  doubt.  It  is 
a  fundamental  issue  involving  the  essence  of  the  peculiarity  of 
probability,  which  prevents  its  being  explained  away  in  terms 
of  statistical  frequency  or  anything  else. 

12.  Yet  perhaps  a  modified  view  of  the  frequency  theory 
could  be  evolved  which  would  avoid  this  difficulty,  and  I  proceed, 
therefore,  to  some  further  criticisms.     It  might  be  agreed  that  a 
novel  element  must  be  admitted  at  this  point,  and  that  relevancy 
must  be  determined  in  some  such  manner  as  has  been  explained 
in  earlier  chapters.     With  this  admission,  it  might  be  argued,  the 
theory  would  still  stand,  divested,  it  is  true,  of  some  of  its  original 
simplicity,   but  nevertheless  a  substantial  theory  differing  in 
important   respects,   although  not   quite   so   fundamentally   as 
before,  from  alternative  schemes. 

The  next  important  objection,  then,  is  concerned  with  the 
manner  in  which  the  principal  theorems  of  probability  are  to  be 
established  on  a  theory  of  frequency.  This  will  involve  an 
anticipation  in  some  part  of  later  arguments  ;  and  the  reader 
may  be  well  advised  to  return  to  the  following  paragraph  after 
he  has  finished  Part  II. 

13.  Let  us  begin  by  a  consideration  of  the  '  Addition  Theorem.' 
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If  a/h  denotes  the  probability  of  a  on  hypothesis  li,  this  theorem 
may  be  written  (a+b)/h=a/h  +  b/h-ab/h,  and  may  be  read 
'  On  hypothesis  h  the  probability  of  "a  or  6  "  is  equal  to  the 
probability  of  a  +  the  probability  of  b  -  the  probability  of 
"  both  a  and  b."  This  theorem,  interpreted  in  some  way  or 
other,  is  universally  assumed  ;  and  we  must,  therefore,  inquire 
what  proof  of  it  the  frequency  theory  can  afford.  A  little 
symbolism  will  assist  the  argument  :  Let  af  represent  the  truth- 
frequency  of  any  class  a,  and  let  ajh  stand  for  '  the  probability 
of  a  on  hypothesis  h,  a  being  the  class  of  reference  determined 
by  this  hypothesis.'  l  We  then  have  ajh  =  aft  and  we  require  to 
prove  a  proposition,  for  values  of  7  and  8  not  yet  determined, 
which  will  be  of  the  form  : 


Now  if  o'  is  the  class  of  propositions  (a  +  b)  such  that  a  is  an 
a  and  6  a  /3,  it  is  easily  shown  by  the  ordinary  arithmetic  of  classes 
that  8,  =  af  +  (3f-a/3f  where  a  {3  is  the  class  of  propositions  which 
are  members  of  both  a  and  (3.  In  the  case,  therefore,  where 
8  =  0"'  and  7  =  a/3,  an  addition  theorem  of  the  required  kind  has 
been  established. 

But  it  does  not  follow  by  any  reasonable  rule  that,  if  h  deter 
mines  a  and  {3  as  the  appropriate  classes  of  reference  for  a  and  6, 
h  must  necessarily  determine  B'  and  a/3  as  the  appropriate  classes 
of  reference  for  (a  +b]  and  ab  ;  it  may,  for  instance,  be  the  case 
that  h,  while  it  renders  a  and  (3  determinate,  yields  no  informa 
tion  whatever  regarding  a/3,  and  points  to  some  quite  different 
class  fj,  as  the  suitable  class  of  reference  for  ab.  On  the  frequency 
theory,  therefore,  we  cannot  maintain  that  the  addition  theorem 
is  true  in  general,  but  only  in  those  special  cases  where  it  happens 
that  0  =  8'  and  7  =  a  [3. 

The  following  is  a  good  example  :  We  are  given 
that  the  proportion  oi  black-haired  men  in  the  population 

it  '!)  t 

is     l  and  the  proportion  of  colour-blind  men     ~   and  there  is  no 

'/  '/ 

known  connection  between  black  -  hair  and  colour  -  blindness  : 
what  is  the  probability  that  a  man,  about  whom  nothing  special 

1  Tlu-  question.  previously  at  issue,  us  to  how  the  class  of  reference  is  deter 
mined  by  thr  hypothesis,  Ls  now  ignored. 
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is  known,  is  x  either  black-haired  or  colour-blind  ?  If  we  represent 
the  hypotheses  by  h  and  the  alternatives  by  a  and  6,  it  would 
usually  be  held  that,  colour-blindness  and  black  hair  being 

P  To 
independent  for  knowledge  2  relative  to  the  given  data,  cibjh  =   l  2S 

and  that,  therefore,  by  the  addition  theorem,    (a  +  b)/h  =  -l  + 

~     21  •     ^ut'  on  ^e  fre(luency  theory,  this  result  might  be 

p  p 
invalid;  for  aj3f=~  l%2,  only  if  this  is  the  actual  proportion  in  fact 

of  persons  who  are  both  colour-blind  and  black-haired,  and  that 
this  is  the  actual  proportion  cannot  possibly  be  inferred  from 
the  independence  for  knowledge  of  the  characters  in  question.3 

Precisely  the  same  difficulty  arises  in  connection  with  the 
multiplication  theorem  ab/h  =  ajbh .  b/h.*  In  the  frequency  nota 
tion,  which  is  proposed  above,  the  corresponding  theorem  will 
be  of  the  form  abjh  =  ay/bh .  bft/h.  For  this  equation  to  be  satisfied 
it  is  easily  seen  that  8  must  be  the  class  of  propositions  xy  such 
that  x  is  a  member  of  a  and  y  of  {3,  and  7  the  class  of  propositions 
xb  such  that  x  is  a  member  of  a ;  and,  as  in  the  case  of  the  addition 
theorem,  we  have  no  guarantee  that  these  classes  7  and  S  will  be 
those  which  the  hypotheses  bh  and  h  will  respectively  determine 
as  the  appropriate  classes  of  reference  for  a  and  ab. 

In  the  case  of  the  theorem  of  inverse  probability  5 

bjak     ajb/i     bjh 
c/ak     a/e/i     cjh 

the  same  difficulty  again  arises,  with  an  additional  one  when 
practical  applications  are  considered.  For  the  relative  proba 
bilities  of  our  a  priori  hypotheses,  b  and  c,  will  scarcely  ever  be 
capable  of  determination  by  means  of  known  frequencies,  and  in 
the  most  legitimate  instances  of  the  inverse  principle's  operation 

1  Jn  the  course  of  the  present  discussion  the  disjunctive  a  +  b  is  never  inter 
preted  so  as  to  exclude  the  conjunctive  ab. 

2  For  a  discussion  of  this  term  see  Chapter  XVI.  §  2. 

3  Venn  argues  (Logic  of  Chance,  pp.  173,  174)  that  there  is  an  inductive 
ground  for  making  this  inference.     The  question  of  extending  the  fundamental 
theorems  of  a  frequency  theory  of  probability  by  means  of  induction  is  discussed 
in  §  1-1  below. 

4  Vide  Chapter  XII.  §  6,  and  Chapter  XIV.  §  4. 

5  Vide  Chapter  XIV.  §  5. 
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we  depend  either  upon  an  inductive  argument  or  upon  the 
Principle  of  Indifference.  It  is  hard  to  think  of  an  example  in 
which  the  frequency  conditions  are  even  approximately  satisfied. 

Thus  an  important  class  of  case,  in  which  arguments  in  proba 
bility,  generally  accepted  as  satisfactory,  do  not  satisfy  the 
frequency  conditions  given  above,  are  those  in  which  the  notion 
is  introduced  of  two  propositions  being,  on  certain  data,  inde 
pendent  for  knowledge.  The  meaning  and  definition  of  this 
expression  is  discussed  more  fully  in  Part  II.  ;  but  I  do  not  see 
what  interpretation  the  frequency  theory  can  put  upon  it.  Yet 
if  the  conception  of  '  independence  for  knowledge  '  is  discarded, 
we  shall  be  brought  to  a  standstill  in  the  vast  majority  of  problems, 
which  are  ordinarily  considered  to  be  problems  in  probability, 
simply  from  the  lack  of  sufficiently  detailed  data.  Thus  the 
frequency  theory  is  not  adequate  to  explain  the  processes  of 
reasoning  which  it  sets  out  to  explain.  If  the  theory  restricts  its 
operation,  as  would  seem  necessary,  to  those  cases  in  which  we 
know  precisely  how  far  the  true  members  of  a  and  fi  overlap, 
the  vast  majority  of  arguments  in  which  probability  has  been 
employed  must  be  rejected. 

14.  An  appeal  to  some  further  principle  is,  therefore,  required 
before  the  ordinary  apparatus  of  probable  inference  can  be  estab 
lished  on  considerations  of  statistical  frequency  ;  and  it  may 
have  occurred  to  some  readers  that  assistance  may  be  obtained 
from  the  principles  of  induction.  Here  also  it  will  be  necessary 
to  anticipate  a  subsequent  discussion.  If  the  argument  of  Part 
III.  is  correct,  nothing  is  more  fatal  than  Induction  to  the  theory 
now  under  criticism.  For,  so  far  from  Induction's  lending 
support  to  the  fundamental  rules  of  probability,  it  is  itself 
dependent  on  them.  In  any  case,  it  is  generally  agreed  that 
an  inductive  conclusion  is  only  probable,  and  that  its  probability 
increases  with  the  number  of  instances  upon  which  it  is  founded. 
According  to  the  frequency  theory,  this  belief  is  only  justified  if 
the  majority  of  inductive  conclusions  actually  are  true,  and  it 
will  be  false,  even  on  our  existing  data,  that  any  of  them  are  even 
probable,  if  the  acknowledged  possibility  that  a  majority  are 
false  is  an  actuality.  Yet  what  possible  reason  can  the  frequency 
theory  offer,  which  does  not  beg  the  question,  for  supposing  that 
a  majority  are  true  ?  And  failing  this,  what  ground  have  we 
for  believing  the  inductive  process  to  be  reasonable  ?  Yet  we 
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invariably  assume  that  with  our  existing  knowledge  it  is  logically 
reasonable  to  attach  some  weight  to  the  inductive  method,  even 
if  future  experience  shows  that  not  one  of  its  conclusions  is  verified 
in  fact.  The  frequency  theory,  therefore,  in  its  present  form  at 
any  rate,  entirely  fails  to  explain  or  justify  the  most  important 
source  of  the  most  usual  arguments  in  the  field  of  probable 
inference. 

15.  The  failure  of  the  frequency  theory  to  explain  or  justify 
arguments  from  induction  or  analogy  suggests  some  remarks  of  a 
more  general  kind.  While  it  is  undoubtedly  the  case  that  many 
valuable  judgments  in  probability  are  partly  based  on  a  know 
ledge  of  statistical  frequencies,  and  that  many  more  can  be  held, 
with  some  plausibility,  to  be  indirectly  derived  from  them,  there 
remains  a  great  mass  of  probable  argument  which  it  would  be 
paradoxical  to  justify  in  the  same  manner.  It  is  not  sufficient, 
therefore,  even  if  it  is  possible,  to  show  that  the  theory  can  be 
developed  in  a  self -consistent  manner ;  it  must  also  be  shown 
how  the  body  of  probable  argument,  upon  which  the  greater 
part  of  our  generally  accepted  knowledge  seems  to  rest,  can 
be  explained  in  terms  of  it ;  for  it  is  certain  that  much  of 
it  does  not  appear  to  be  derived  from  premisses  of  statistical 
frequency. 

Take,  for  instance,  the  intricate  network  of  arguments  upon 
which  the  conclusions  of  The  Origin  of  Species  are  founded  : 
how  impossible  it  would  be  to  transform  them  into  a  shape  in 
which  they  would  be  seen  to  rest  upon  statistical  frequency  ! 
Many  individual  arguments,  of  course,  are  explicitly  founded 
upon  such  considerations  ;  but  this  only  serves  to  differentiate 
them  more  clearly  from  those  which  are  not.  Darwin's  own 
account  of  the  nature  of  the  argument  may  be  quoted  :  "  The 
belief  in  Natural  Selection  must  at  present  be  grounded  entirely 
on  general  considerations  :  (1)  on  its  being  a  vera  causa,  from 
the  struggle  for  existence  and  the  certain  geological  fact  that 
species  do  somehow  change  ;  (2)  from  the  analogy  of  change 
under  domestication  by  man's  selection  ;  (3)  and  chiefly  from 
this  view  connecting  under  an  intelligible  point  of  view  a  host 
of  facts.  When  we  descend  to  details  ...  we  cannot  prove  that 
a  single  species  has  changed  ;  nor  can  we  prove  that  the  supposed 
changes  are  beneficial,  which  is  the  groundwork  of  the  theory  ; 
nor  can  we  explain  why  some  species  have  changed  and  others 
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have  not."  l  Not  only  in  the  main  argument,  but  in  many  of  the 
subsidiary  discussions,2  an  elaborate  combination  of  induction 
and  analogy  is  superimposed  upon  a  narrow  and  limited  know 
ledge  of  statistical  frequency.  And  this  is  equally  the  case  in 
almost  all  everyday  arguments  of  any  degree  of  complexity. 
The  class  of  judgments,  which  a  theory  of  statistical  frequency 
can  comprehend,  is  too  narrow  to  justify  its  claim  to  present  a 
complete  theory  of  probability. 

16.  Before  concluding  this  chapter,  we  should  not  overlook 
the  element  of  truth  which  the  frequency  theory  embodies  and 
which  provides  its  plausibility.     In  the  first  place,  it  gives  a 
true  account,  so  long  as  it  does  not  argue  that  probability  and 
frequency  are  identical,  of  a  large  number  of  the  most  precise 
arguments  in  probability,  and  of  those  to  which  mathematical 
treatment  is  easily  applicable.     It  is  this  characteristic  which 
has   recommended   it   to   statisticians,   and   explains   the   large 
measure  of  its  acceptance  in  England  at  the  present  time  ;    for 
the  popularity  in  this  country  of  an  opinion,  which  has,  so  far 
as  I  know,  no  thorough  supporters  abroad,  may  reasonably  be 
attributed   to  the  chance  which  has  led  most  of  the  English 
writers,  who  have  paid  much  attention  to  probability  in  recent 
years,  to  approach  the  subject  from  the  statistical  side. 

In  the  second  place,  the  statement  that  the  probability  of  an 
event  is  measured  by  its  actual  frequency  of  occurrence  *  in  the 
long  run  '  has  a  very  close  connection  with  a  valid  conclusion 
which  can  be  derived,  in  certain  cases,  from  Bernoulli's  theorem. 
This  theorem  and  its  connection  with  the  theory  of  frequency  will 
be  the  subject  of  Chapter  XXIX. 

17.  The  absence  of  a  recent  exposition  of  the  logical  basis  of 
the  frequency  theory  by  any  of  its  adherents  has  been  a  great 
disadvantage  to  me  in  criticising  it.     It  is  possible  that  some 
of  the  opinions,  which  I  have  examined  at  length,  are  now  held 
by  no  one  ;   nor  am  I  absolutely  certain,  at  the  present  stage  of 
the  inquiry,  that  a  partial  rehabilitation  of  the  theory  may  not 
be  possible.     But  I  am  sure  that  the  objections  which  I  have 
raised  cannot  be  met  without  a  great  complication  of  the  theory, 
and  without  robbing  it  of  the  simplicity  which  is  its  greatest 

1  Letter  to  <i.  JiciitliJirn,  Life  and  I,rltcrn,  vol.  jii.  p.  L'f*. 

2  K.g.  in  the  discussion   on   tho  relative   effect   of   disuse  nnd   wlcrtinn    in 
reducing  unnecessary  organs  to  a  rudimentary  condition. 
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preliminary  recommendation.  Until  the  theory  has  been  given 
new  foundations,  its  logical  basis  is  not  so  secure  as  to  permit 
controversial  applications  of  it  in  practice.  A  good  deal  of 
modern  statistical  work  may  be  based,  I  think,  upon  an  incon 
sistent  logical  scheme,  which,  avowedly  founded  upon  a  theory 
of  frequency,  introduces  principles  which  this  theory  has  no 
power  to  justify. 


CHAPTER   IX 

THE    COXSTRUCTIVK    THEORY    OF    TART    I.    SUMMARISED 

1.  THAT  part  of  our  knowledge  which  we  obtain  directly, 
supplies  the  premisses  of  that  part  which  we  obtain  by  argument. 
From  these  premisses  we  seek  to  justify  some  degree  of  rational 
belief  about  all  sorts  of  conclusions.  We  do  this  by  perceiv 
ing  certain  logical  relations  between  the  premisses  and  the 
conclusions.  The  kind  of  rational  belief  which  we  infer  in 
this  manner  is  termed  probable  (or  in  the  limit  certain],  and  the 
logical  relations,  by  the  perception  of  which  it  is  obtained,  we 
term  relations  of  probability. 

The  probability  of  a  conclusion  a  derived  from  premisses  h 
we  write  a/h  ;  and  this  symbol  is  of  fundamental  importance. 

2.  The  object  of  the.  Theory  or  Logic  of  Probability  is  to 
systematise  such  processes  of  inference.  Tn  particular  it  aims 
at  elucidating  rules  by  menus  of  which  the  probabilities  of  different 
arguments  can  be  compared.  It  is  of  great  practical  importance 
to  determine  which  of  two  conclusions  is  on  the  evidence  the 
more  probable. 

The  most  important  of  these  rules  is  the  Principle  of 
Indifference.  According  to  this  Principle  we  must  rely  upon 
direct  judgment  for  discriminating  between  the  relevant  and 
the  irrelevant  parts  of  the  evidence.  We  can  only  discard 
those  parts  of  the  evidence  which  are  irrelevant  by  seeing  that 
they  have  no  logical  bearing  on  the  conclusion.  Tin*  irrelevant 
evidence  being  thus  discarded,  the  Principle  lays  it  down  that 
if  the  evidence  for  either  conclusion  is  the  same  (i.e.  symmetrical), 
then  their  probabilities  also  are  the  same  (i.e.  equal). 

If,  on  the  other  hand,  there  is  additional  evidence  (i.e.  in 
addition  to  the  symmetrical  evidence)  for  one  of  the  conclusions, 
and  this  evidence  is  favourably  relevant,  then  that  conclusion  is 

ill 
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the  more  probable.  Certain  rules  have  been  given  by  which  to 
judge  whether  or  not  evidence  is  favourably  relevant.  And  by 
combinations  of  these  judgments  of  preference  with  the  judg 
ments  of  indifference  warranted  by  the  Principle  of  Indifference 
more  complicated  comparisons  are  possible. 

3.  There   are,   however,  many  cases   in   which  these   rules 
furnish  no  means  of  comparison  ;   and  in  which  it  is  certain  that 
it  is  not  actually  within  our  power  to  make  the  comparison.    It 
has  been  argued  that  in  these  cases  the  probabilities  are,  in  fact, 
not  comparable.     As  in  the  example  of  similarity,  where  there 
are  different  orders  of  increasing  and  diminishing  similarity,  but 
where  it  is  not  possible  to  say  of  every  pair  of  objects  which  of 
them  is  on  the  whole  the  more  like  a  third  object,  so  there  are 
different  orders  of  probability,  and  probabilities,  which  are  not 
of  the  same  order,  cannot  be  compared. 

4.  It  is  sometimes  of  practical  importance,  when,  for  example, 
we  wish  to  evaluate  a  chance  or  to  determine  the  amount  of 
our  expectation,  to  say  not  only  that  one  probability  is  greater 
than  another,  but  by  how  much  it  is  greater.     We  wish,  that  is 
to  say,  to  have  a  numerical  measure  of  degrees  of  probability. 

This  is  only  occasionally  possible.  A  rule  can  be  given  for 
numerical  measurement  when  the  conclusion  is  one  of  a  number 
of  equiprobable,  exclusive,  and  exhaustive  alternatives,  but  not 
otherwise. 

5.  In  Part  II.   I  proceed  to  a  symbolic  treatment   of  the 
subject,  and  to  the  greater  systematisation,  by  symbolic  methods 
on  the  basis  of  certain  axioms,  of  the  rules  of  probable  argument. 

In  Parts  III.,  IV.,  and  V.  the  nature  of  certain  very  important 
types  of  probable  argument  of  a  complex  kind  will  be  treated 
in  detail ;  in  Part  III.  the  methods  of  Induction  and  Analogy, 
in  Part  IV.  certain  semi-philosophical  problems,  and  in  Part  V. 
the  logical  foundations  of  the  methods  of  inference  now  com 
monly  known  as  statistical. 


PART  ii 

FUNDAMENTAL  THEOREMS 
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CHAPTER   X 

INTRODUCTORY 

1.  IN  Part  I.  we  have  been  occupied  with  the  epistemology  of  our 
subject,  that  is  to  say,  with  what  we  know  about  the  characteristics 
and  the  justification  of  probable  Knowledge.  In  Part  II.  I  pass 
to  its  Formal  Logic.  I  am  not  certain  of  how  much  positive  value 
this  Part  will  prove  to  the  reader.  My  object  in  it  is  to  show 
that,  starting  from  the  philosophical  ideas  of  Part  I.,  we  can 
deduce  by  rigorous  methods  out  of  simple  and  precise  definitions 
the  usually  accepted  results,  such  as  the  theorems  of  the  addition 
and  multiplication  of  probabilities  and  of  inverse  probability. 
The  reader  will  readily  perceive  that  this  Part  would  never  have 
been  written  except  under  the  influence  of  Mr.  Russell's  Principia 
Mathematica.  But  1  am  sensible  that  it  may  suffer  from  the 
over-elaboration  and  artificiality  of  this  method  without  the 
justification  which  its  grandeur  of  scale  affords  to  that  great  work. 
In  common,  however,  with  other  examples  of  formal  method, 
tliis  attempt  has  had  the  negative  advantage  of  compelling  the 
author  to  make  his  ideas  precise  and  of  discovering  fallacies  and 
mistakes.  It  is  a  part  of  the  spade-work  which  a  conscientious 
author  has  to  undertake  ;  though  the  process  of  doing  it  may 
be  of  greater  value  to  him  than  the  results  can  be  to  the  reader, 
who  is  concerned  to  know,  as  a  safeguard  of  the  reliability  of  the 
rest  of  the  construction,  that  the  thing  can  be  done,  rather  than 
to  examine  the  architectural  plans  in  detail.  In  the  development 
of  my  own  thought,  the  following  chapters  have  been  of  great 
importance.  For  it  was  through  trying  to  prove  the  fundamental 
theorems  of  the  subject  on  the  hypothesis  that  Probability  was 
a  relation  that  I  first  worked  my  way  into  the  subject  ;  and  the 
rest  of  this  Treatise  has  arisen  out  of  attempts  to  solve  the 
successive  questions  to  which  the  ambition  to  treat  Probability 
as  a  branch  of  Formal  Logic  first  gave  rise. 
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A  further  occasion  of  diffidence  and  apology  in  introducing 
this  Part  of  my  Treatise  arises  out  of  the  extent  of  my  debt  to 
Mr.  W.  E.  Johnson.  I  worked  out  the  first  scheme  in  complete 
independence  of  his  work  and  ignorant  of  the  fact  that  he  had 
thought,  more  profoundly  than  I  had,  along  the  same  lines  ;  I 
have  also  given  the  exposition  its  final  shape  with  my  own  hands. 
But  there  was  an  intermediate  stage,  at  which  I  submitted  what 
I  had  done  for  his  criticism,  and  received  the  benefit  not  only  of 
criticism  but  of  his  own  constructive  exercises.  The  result  is 
that  in  its  final  form  it  is  difficult  to  indicate  the  exact  extent  of 
my  indebtedness  to  him.  When  the  following  pages  were  first 
in  proof,  there  seemed  little  likelihood  of  the  appearance  of  any 
work  on  Probability  from  his  own  pen,  and  I  do  not  now  proceed 
to  publication  with  so  good  a  conscience,  when  he  is  announcing 
the  approaching  completion  of  a  work  on  Logic  which  will  include 
"  Problematic  Inference." 

I  propose  to  give  here  a  brief  summary  of  the  five  chapters 
following,  without  attempting  to  be  rigorous  or  precise.  I  shall 
then  be  free  to  write  technically  in  Chapters  XI.-XV.,  inviting 
the  reader,  who  is  not  specially  interested  in  the  details  of  this 
sort  of  technique,  to  pass  them  by. 

2.  Probability  is  concerned  with  arguments,  that  is  to  say, 
with  the  "  bearing  "  of  one  set  of  propositions  upon  another  set. 
If  we  are  to  deal  formally  with  a  generalised  treatment  of  this 
subject,  we  must  be  prepared  to  consider  relations  of  probability 
between  any  pair  of  sets  of  propositions,  and  not  only  between 
sets  which  are  actually  the  subject  of  knowledge.  But  we  soon 
find  that  some  limitation  must  be  put  on  the  character  of  sets  of 
propositions  which  we  can  consider  as  the  hypothetical  subject 
of  an  argument,  namely,  that  they  must  be  possible  subjects  of 
knowledge.  We  cannot,  that  is  to  say,  conveniently  apply  our 
theorems  to  premisses  which  are  self-contradictory  and  formally 
inconsistent  with  themselves. 

For  the  purpose  of  this  limitation  we  have  to  make  a  distinc 
tion  between  a  set  of  propositions  which  is  merely  false  in  fact 
and  a  set  which  is  formally  inconsistent  with  itself.1  This  leads 

1  Spinoza  had  in  mind,  I  think,  the  distinction  between  Truth  and  Prob 
ability  in  his  treatment  of  Necessity,  Contingence,  and  Possibility.  Res 
enirn  omnes  ex  data  Dei  natura  necessario  sequutae  sunt,  et  ex  necessitate  naturae 
Dei  detenninatae  sunt  ad  certo  modo  existendum  el  operandum  (Ethices  i.  33). 
That  is  to  say,  everything  is,  without  qualification,  true  or  false.  At  res 
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us  to  the  conception  of  a  group  of  propositions,  which  is  defined 
as  a  set  of  propositions  such  that  —  (i.)  if  a  logical  principle 
belongs  to  it,  all  propositions  which  are  instances  of  that  logical 
principle  also  belong  to  it  ;  (ii.)  if  the  proposition  p  and  the 
proposition  '  not-;?  or  q  '  both  belong  to  it,  then  the  proposition 
q  also  belongs  to  it  ;  (iii.)  if  any  proposition  p  belongs  to  it.  then 
the  contradictory  of  ;;  is  excluded  from  it.  If  the  group  defined 
by  one  part  of  a  set  of  propositions  excludes  a  proposition  which 
belongs  to  a  group  defined  by  another  part  of  the  set,  then  the 
set  taken  as  a  whole  is  inconsistent  with  itself  and  is  incapable  of 
forming  the  premiss  of  an  argument. 

The  conception  of  a  group  leads  on  to  a  precise  definition  of 
one  proposition  requiring  another  (which  in  the  realm  of  assertion 
corresponds  to  relevance  in  the  realm  of  probability),  and  of  logical 
priority  as  being  an  order  of  propositions  arising  out  of  their 
relation  to  those  special  groups,  or  real  groups,  which  are  in  fact 
the  subject  of  knowledge.  Logical  priority  has  no  absolute 
signification,  but  is  relative  to  a  specific  body  of  knowledge,  or, 
as  it  has  been  termed  in  the  traditional  logic,  to  the  Universe  of 
Reference. 

It  also  enables  us  to  reach  a  definition  of  inference  distinct  from 
implication,  as  defined  by  Mr.  Russell.  This  is  a  matter  of  very 
great  importance.  Readers  who  are  acquainted  with  the  work 
of  Mr.  Russell  and  his  followers  will  probably  have  noticed  that 
the  contrast  between  his  work  and  that  of  the  traditional  logic 
is  by  no  means  wholly  due  to  the  greater  precision  and  more 
mathematical  character  of  his  technique.  There  is  a  difference 
also  in  the  design.  His  object  is  to  discover  what  assumptions 
are  required  in  order  that  the  formal  propositions  generally 
accepted  by  mathematicians  and  logicians  may  be  obtainable 


uli'fua  nulla  alia  de  causa  contingent  dicitur,  nisi  respect/t  defechia  nostraf 
cognitioni*  (Kthice*  i.  33.  scholium).  That  is  to  ray,  Continence,,  or.  as  I 
term  it,  Probability,  solely  arises  out  of  the  limitations  of  our  knowledge. 
Contingence  in  this  wide  sense,  which  includes  every  proposition  which,  in 
relation  to  our  knowledge,  ifl  only  probable  (this  term  covering  all  intermediate 
degrees  of  probability),  may  be  further  divided  into  Continence  in  the  strict 
sen.-o,  which  corresponds  to  an  ^i  priori  or  formal  probability  exceeding  /.em. 
and  Possibility  ;  that  is  U>  .say,  into  formal  possibility  and  empirical  possibility. 
lie.*  xinijularts  roco  continfjentes,  quatenu.i,  dum  (id  cant  in  ml  am  cssentiam 
attcndimus,  nihil  invenimus,  <iuod  earurn  exi«tentiam  necwmrio  ponat,  re/ 
quod  ipsam  tiecessario  nedudat.  Ea*dem  re*  singulars  voco  poanibiles.  quatenus, 
dum  ad  can  HUM,  ex  quibua  produci  debent.  ntte.ndinnm,  near  \miis.  an 
delerminaJae  «int  ad  tandem  prodttrendum  (Ethict*  iv.  Dcf  3,  4). 
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as  the  result  of  successive  steps  or  substitutions  of  a  few  very 
simple  types,  and  to  lay  bare  by  this  means  any  inconsistencies 
which  may  exist  in  received  results.  But  beyond  the  fact  that 
the  conclusions  to  which  he  seeks  to  lead  up  are  those  of  common 
sense,  and  that  the  uniform  type  of  argument,  upon  the  validity 
of  which  each  step  of  his  system  depends,  is  of  a  specially  obvious 
kind,  he  is  not  concerned  with  analysing  the  methods  of  valid 
reasoning  which  we  actually  employ.  He  concludes  with 
familiar  results,  but  he  reaches  them  from  premisses,  which  have 
never  occurred  to  us  before,  and  by  an  argument  so  elaborate  that 
our  minds  have  difficulty  in  following  it.  As  a  method  of  setting 
forth  the  system  of  formal  truth,  which  shall  possess  beauty, 
inter-dependence,  and  completeness,  his  is  vastly  superior  to 
any  which  has  preceded  it.  But  it  gives  rise  to  questions  about 
the  relation  in  which  ordinary  reasoning  stands  to  this  ordered 
system,  and,  in  particular,  as  to  the  precise  connection  between 
the  process  of  inference,  in  which  the  older  logicians  were  princi 
pally  interested  but  which  he  ignores,  and  the  relation  of  implica 
tion  on  which  his  scheme  depends. 

'  p  implies  q  '  is,  according  to  his  definition,  exactly  equivalent 
to  the  disjunction  '  q  is  true  or  p  is  false.'  If  q  is  true,  '  p  implies 
q  '  holds  for  all  values  of  p ;  and  similarly  if  p  is  false,  the  im 
plication  holds  for  all  values  of  q.  This  is  not  what  we  mean 
when  we  say  that  q  can  be  inferred  or  follows  from  p.  For  what 
ever  the  exact  meaning  of  inference  may  be,  it  certainly  does  not 
hold  between  all  pairs  of  true  propositions,  and  is  not  of  such  a 
character  that  every  proposition  follows  from  a  false  one.  It  is 
not  true  that  '  A  male  now  rules  over  England  '  follows  or  can  be 
inferred  from  'A  male  now  rules  over  France  '  ;  or  'A  female  now 
rules  over  England  '  from  '  A  female  now  rules  over  France  '  ; 
whereas,  on  Mr.  Russell's  definition,  the  corresponding  implica 
tions  hold  simply  in  virtue  of  the  facts  that  '  A  male  now  rules 
over  England '  is  true  and  '  A  female  now  rules  over  France ' 
is  false. 

The  distinction  between  the  Relatival  Logic  of  Inference  and 
Probability,  and  Mr.  Russell's  Universal  Logic  of  Implication, 
seems  to  be  that  the  former  is  concerned  with  the  relations  of 
propositions  in  general  to  a  particular  limited  group.  Inference 
and  Probability  depend  for  their  importance  upon  the  fact  that 
in  actual  reasoning  the  limitation  of  our  knowledge  presents  us 
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with  a  particular  set  of  propositions,  to  which  we  must  relate  any 
other  proposition  about  which  we  seek  knowledge.  The  course 
of  an  argument  and  the  results  of  reasoning  depend,  not  simply 
on  what  is  true,  but  on  the  particular  body  of  knowledge  from 
which  we  have  set  out.  Ultimately,  indeed,  Mr.  Russell  cannot 
avoid  concerning  himself  with  groups.  For  his  aim  is  to  discover 
the  smallest  set  of  propositions  which  specify  our  formal  know 
ledge,  and  then  to  show  that  they  do  in  fact  specify  it.  In  this 
enterprise,  being  human,  he  must  confine  himself  to  that  part  of 
formal  truth  which  we  know,  and  the  question,  how  far  his 
axioms  comprehend  all  formal  truth,  must  remain  insoluble. 
But  his  object,  nevertheless,  is  to  establish  a  train  of  implications 
between  formal  truths  ;  and  the  character  and  the  justification  of 
rational  argument  as  such  is  not  his  subject. 

3.  Passing  on  from  these  preliminary  reflections,  our  first 
task  is  to  establish  the  axioms  and  definitions  which  are  to  make 
operative  our  symbolical  processes.  These  processes  are  almost 
entirely  a  development  of  the  idea  of  representing  a  probability 
by  the  symbol  a/h,  where  h  is  the  premiss  of  an  argument  and  a 
its  conclusion.  It  might  have  been  a  notation  more  in  accord 
ance  with  our  fundamental  ideas,  to  have  employed  the  symbol 
a/h  to  designate  the  argument  from  h  to  a,  and  to  have  represented 
the  probability  of  the  argument,  or  rather  the  degree  of  rational 
belief  about  a  which  the  argument  authorises,  by  the  symbol 
P(a/A).  This  would  correspond  to  the  symbol  V(a/h]  which  has 
been  employed  in  Chapter  VI.  for  the  evidential  value  of  the 
argument  as  distinct  from  its  probability.  But  in  a  section 
where  we  are  only  concerned  with  probabilities,  the  use  of  P(a//f) 
would  have1  been  unnecessarily  cumbrous,  and  it  is,  therefore, 
convenient  to  drop  the  prefix  P  and  to  denote  the  probability 
itself  by  a/h. 

The  discovery  of  a  convenient  symbol,  like  that  of  an  essential 
word,  lias  often  proved  of  more  than  verbal  importance.  Hear 
thinking  on  the  subject  of  Probability  is  not  possible  without  a 
symbol  which  takes  an  explicit  account  of  the  premiss  of  the 
argument  as  well  as  of  its  conclusion  ;  and  endless  confusion  has 
arisen  through  discussions  about  the  probability  of  a  conclusion 
without  reference  to  the  argument  as  a  whole.  I  claim,  therefore, 
the  introduction  of  the  symbol  a/h  as  an  essential  step  towards 
any  progress  in  the  subject. 
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4.  Inasmuch  as  relations  of  Probability  cannot  be  assumed 
to  possess  the  properties  of  numbers,  the  terms  addition  and 
multiplication  of  probabilities  have  to  be  given  appropriate 
meanings  by  definition.  It  is  convenient  to  employ  these 
familiar  expressions,  rather  than  to  invent  new  ones,  because  the 
properties  which  arise  out  of  our  definitions  of  addition  and 
multiplication  in  Probability  are  analogous  to  those  of  addition 
and  multiplication  in  Arithmetic.  But  the  process  of  establishing 
these  properties  is  a  little  complicated  and  occupies  the  greater 
part  of  Chapter  XII. 

The  most  important  of  the  definitions  of  Chapter  XII.  are  the 
following  (the  numbers  referring  to  the  numbers  of  Chapter 
XII.)  : 

II.  The  Definition  of  Certainty  :  a/h  =  I. 

III.  The  Definition  of  Impossibility  :  a/h  =  0. 

VI.  The  Definition  of  Inconsistency  :    ah  is  inconsistent  if 
a/h  =  0. 

VII.  The  Definition  of  a  Group  :    the  class  of  propositions  a 
such  that  a/h  =  1  is  the  group  h. 

VIII.  The  Definition  of  Equivalence  :  if  b/ah  =  I  and  a/bh  =  1 


IX.  The  Definition  of  Addition:    ab/h  +  afi/h1=a/h. 

X.  The    ^Definition     of     Multiplication  :      ab/h  =  a/bh  .  b/h  = 
b/ah  .  a/h.   fThe  symbolical  development  of  the  subject  largely 
proceeds  out  of  these  definitions  of  Addition  and  Multiplication. 
It  is  to  be  observed  that  they  give  a  meaning,  not  to  the  addition 
and  multiplication  of  any  pairs  of  probabilities,  but  only  to  pairs 
which  satisfy  a  certain  form.     The  definition  of  Multiplication 
may  be  read  :   '  the  probability  of  both  a  and  6  given  h  is  equal 
to  the  probability  of  a  given  bh,  multiplied  by  the  probability  of 
b  given  h.' 

XI.  The    Definition    of    Independence  :    if    a^aji  =  ajh  and 
a-2/Ojh  =  a2/h,  ajh  and  a2/h  are  independent. 

XII.  The  Definition  of    Irrelevance:    if   a1/a2h=a1/h,  a2  is 
irrelevant  to  ajh. 

5.  In  Chapter  XIII.  these  definitions,  supplemented  by  a  few 
axioms,  are  employed  to  demonstrate  the  fundamental  theorems 
of  Certain  or  Necessary  Inference.  The  interest  of  this  chiefly 
lies  in  the  fact  that  these  theorems  include  those  which  the 

1  6  stands  for  the  contradictory  of  b. 
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traditional  Logic  has  termed  the  Laws  of  Thought,  as  for  example 
the  Law  of  Contradiction  and  the  Law  of  Excluded  Middle. 
These  are  here  exhibited  as  a  part  of  the  generalised  theory 
of  Inference  or  Rational  Argument,  which  includes  probable 
Inference  as  well  as  certain  Inference.  The  object  of  this  chapter 
is  to  show  that  the  ordinarily  accepted  rules  of  Inference  can  in 
fact  be  deduced  from  the  definitions  and  axioms  of  Chapter  XII. 

6.  In  Chapter  XIV.  I  proceed  to  the  fundamental  Theorems 
of  Probable  Inference,  of  which  the  following  are  the  most 
interesting  : 

Addition  Theorem  :  (a  +  b}/h  =  a/h  +  b/h  -  ab/h,  which  reduces 
to  (a  +  b)/h  =  a/h+b/h,  where  a  and  b  are  mutually  exclusive; 
and,  if  p^2  .  .  .  pn  form,  relative  to  h,  a  set  of  exclusive  and 


exhaustive  alternatives,  a/h  =  ^.pr 

i 

Theorem  of  Irrelevance:  If  a/hlh2=a/h1,  then  a/hlh2  =  a/hl; 
i.e.  if  a  proposition  is  irrelevant,  its  contradictory  also  is  irrelevant. 

Theorem  of  Independence  :  If  a2falh=a2lh,  aja2h  =  ajh  ;  i.e. 
if  al  is  irrelevant  to  a2/h,  it  follows  that  a.2  is  irrelevant  to  a-^/h 
and  that  a^/h  and  a2/h  are  independent. 

Multiplication  Theorem  :  If  a^h  and  ajh  are  independent, 
ala2lh  =  allh  .  a2/h. 

fr  T>    T    7-7-         ajbh     b/ti,h    aJh      ^     4l 

Theorem  of  Inverse  Probability  :  -.     --  .     Further, 

a2lbh    b/a2h    a2/h 

if  Oi/h=pl,  <~*2Pl=P2>  bfaji^q^  b/a2h  =  g2,  and  aj/bh  +  a2/bh  =  lt 
then  a,/bh=      M±-~  ;    and  if  </,//*  =  a.,///,  </,/M=     qi 


is  equivalent  to  the  statement  that  the  probability  of  «j  when 
we  know  b  is  equal  to  ,  where  q1  is  the  probability  of  b  when 

?l+?2 

we  know  a}  and  g2  its  probability  when  we  know  cr2.  This 
theorem  enunciated  with  varying  degrees  of  inaccuracy  appears 
in  all  Treatises  on  Probability,  but  is  not  generally  proved. 

Chapter  XIV.  concludes  with  some  elaborate  theorems  on  the 
combination  of  premisses  based  on  a  technical  symbolic  device, 
known  as  the  Cumulative  Formula,  which  is  the  work  of  Mr.  W.  E. 
Johnson. 

7.  In  Chapter  XV.  1  bring  the  non-numerical  theory  of 
probability  developed  in  the  preceding  chapters  into  connection 
with  the  usual  numerical  conception  of  it,  and  demonstrate  how 
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and  in  what  class  of  cases  a  meaning  can  be  given  to  a  numerical 
measure  of  a  relation  of  probability.  This  leads  on  to  what 
may  be  termed  numerical  approximation,  that  is  to  say,  the 
relating  of  probabilities,  which  are  not  themselves  numerical, 
to  probabilities,  which  are  numerical,  by  means  of  greater  and  less, 
by  which  in  some  cases  numerical  limits  may  be  ascribed  to 
probabilities  which  are  not  capable  of  numerical  measures. 


CHAPTER  XI 

THE   THEORY   OF   GROUPS,    WITH   SPECIAL   REFERENCE   TO 
LOGICAL   CONSISTENCE,    INFERENCE,    AND   LOGICAL   PRIORITY 

1.  THE  Theory  of  Probability  deals  with  the  relation  between 
two  sets  of  propositions,  such  that,  if  the  first  set  is  known  to  be 
true,  the  second  can  be  known  with  the  appropriate  degree  of 
probability  by  argument  from  the  first.1  The  relation,  however, 
also  exists  when  the  first  set  is  not  known  to  be  true  and  is  hypo 
thetical. 

In  a  symbolical  treatment  of  the  subject  it  is  important 
that  we  should  be  free  to  consider  hypothetical  premisses,  and 
to  take  account  of  relations  of  probability  as  existing  between 
arty  pair  of  sets  of  propositions,  whether  or  not  the  premiss  is 
actually  part  of  knowledge.  But  in  acting  thus  we  must  be 
careful  to  avoid  two  possible  sources  of  error. 

2.  The  first  is  that  which  is  liable  to  arise  wherever  variables 
are  concerned.     This  was  mentioned  in  passing  in  §  18  of  Chapter 
IV.     We  must  remember  that  whenever  we  substitute  for  a 
variable  some  particular  value  of  it,  this  may  so  affect  the  relevant 
evidence  as  to  modify  the  probability.     This  danger  is  always 
present  except  where,  as  in  the  first  half  of  Chapter  XIII.,  the 
conclusions  respecting  the  variable  are  certain. 

3.  The   second   difficulty   is   of   a  different  character.      Our 
premisses  may  be  hypothetical  and  not  actually  the  subject  of 
knowledge.     But  must  they  not  be  possible  subjects  of  know 
ledge  ?     How  are  we  to  deal  with  hypothetical  premisses  which 
are  self-contradictory  or  formally  inconsistent  with  themselves, 
and  which  cannot  be  the  subject  of  rational  belief  of  any  degree? 

1  Or  more  strictly,  "perception  of  which,  together  with  knowledge  of  the 
first  set,  justifies  an  appropriate  degree  of  rational  belief  about  the-  second." 
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Whether  or  not  a  relation  of  probability  can  be  held  to  exist 
between  a  conclusion  and  a  self-inconsistent  premiss,  it  will  be 
convenient  to  exclude  such  relations  from  our  scheme,  so  as  to 
avoid  having  to  provide  for  anomalies  which  can  have  no  interest 
in  an  account  of  the  actual  processes  of  valid  reasoning.  Where 
a  premiss  is  inconsistent  with  itself  it  cannot  be  required. 

4.  Let   us   term  the   collection   of   propositions,   which   are 
logically  involved  in  the  premisses  in  the  sense  that  they  follow 
from  them,  or,  in  other  words,  stand  to  them  in  the  relation  of 
certainty,1  the  group  specified  by  the  premisses.     That  is  to  say, 
we  define  a  group  as  containing  all  the  propositions  logically 
involved  in  any  of  the  premisses  or  in  any  conjunction  of  them  ; 
and  as  excluding  all  the  propositions  the  contradictories  of  which 
are  logically  involved  in  any  of  the  premisses  or  in  any  con 
junction  of  them.2     To  say,  therefore,  that  a  proposition  follows 
from  a  premiss,  is  the  same  thing  as  to  say  that  it  belongs  to  the 
group  which  the  premiss  specifies. 

The  idea  of  a  '  group  '  will  then  enable  us  to  define  '  logical 
consistency.'  If  any  part  of  the  premisses  specifies  a  group 
containing  a  proposition,  the  contradictory  of  which  is  contained 
in  a  group  specified  by  some  other  part,  the  premisses  are  logically 
inconsistent ;  otherwise  they  are  logically  consistent.  In  short, 
premisses  are  inconsistent  if  a  proposition  '  follows  from  '  one 
part  of  them,  and  its  contradictory  from  another  part. 

5.  We  have  still,  however,  to  make  precise  what  we  mean  in 
this  definition  by  one  proposition  following  from  or  being  logically 
involved  in  the  truth  of  another.     We  seem  to  intend  by  these 
expressions  some  kind  of  transition  by  means  of  a  logical  principle. 
A  logical  principle  cannot  be  better  defined,  I  think,  than  in  terms 
of  what  in  Mr.  Russell's  Logic  of  Implication  is  termed  a  formal 
implication.     '  p  implies  q  '  is  a  formal  implication  if  '  not-^>  or  q  ' 
is  formally  true  ;  and  a  proposition  is  formally  true,  if  it  is  a  value 
of  a  prepositional  function,  in  which  all  the  constituents  other 

1  '  a  can  be  inferred  from  &,'  '  a  follows  from  6,'  '  a  is  certain  in  relation  to 
6,'  '  a  is  logically  involved  in  b,'  I  regard  as  equivalent  expressions,  the  precise 
meaning  of  which  will  be  defined  in  succeeding  paragraphs.  '  a  is  implied  by  b,' 
I  use  in  a  different  sense,  namely,  in  Mr.  Russell's  sense,  as  the  equivalent  of 
'  b  or  not-a.' 

*  For  the  conception  of  a  group,  and  for  many  other  notions  and  definitions 
in  the  course  of  this  chapter — those,  for  example,  of  a  real  group  and  of 
logical  priority — I  am  largely  indebted  to  Mr.  W.  E.  Johnson.  The  origination 
of  the  theory  of  groups  is  due  to  him. 
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than  the  arguments  are  logical  constants,  and  of  which  all  the 
values  are  true. 

We  might  define  a.  group  in  such  a  way  that  all  logical  principles 
belonged  to  every  group.  In  this  case  all  formally  true  proposi 
tions  would  belong  to  every  group.  This  definition  is  logically 
precise  and  would  lead  to  a  coherent  theory.  But  it  possesses 
the  defect  of  not  closely  corresponding  to  the  methods  of  reasoning 
we  actually  employ,  because  all  logical  principles  are  not  in  fact 
known  to  us.  And  even  in  the  case  of  those  which  we  do  know, 
there  seems  to  be  a  logical  order  (to  which  on  the  above  definition 
we  cannot  give  a  sense)  amongst  propositions,  which  are  about 
logical  constants  and  are  formally  true,  just  as  there  is  amongst 
propositions  which  are  not  formally  true.  Thus,  if  we  were  to 
assume  the  premisses  in  every  argument  to  include  all  formally 
true  propositions,  the  sphere  of  probable  argument  would  be 
limited  to  what  (in  contradistinction  to  formally  true  propositions) 
we  may  term  empirical  propositions. 

6.  For  this  reason,  therefore,  I  prefer  a  narrower  definition — 
which  shall  correspond  more  exactly  to  wThat  we  seem  to  mean 
when  we  say  that  one  proposition  follows  from  another.  Let  us 
define  a  group  of  propositions  as  a  set  of  propositions  such  that : 

(i.)  if  the  proposition  '  p  is  formally  true'  belongs  to  the  group, 
all  propositions  which  are  instances  of  the  same  formal  proposi- 
tional  function  also  belong  to  it ; 

(ii.)  if  the  proposition  p  and  the  proposition  lp  implies  q' 
both  be-long  to  it,  then  the  proposition  q  also  belongs  to  it ; 

(iii.)  if  any  proposition  p  belongs  to  it,  then  the  contradictory 
of  p  is  excluded  from  it. 

According  to  this  definition  all  processes  of  certain  inference 
are  wholly  composed  of  steps  each  of  which  is  of  one  of  two  simple 
types  (and  if  we  like  we  might  perhaps  regard  the  first  as  com 
prehending  the  other).  I  do  not  feel  certain  that  these  conditions 
may  not  be  narrower  than  what  we  mean  when  we  say  that  one 
proposition  follows  from  another.  But  it  is  not  necessary  for  the 
purpose  of  defining  a  group,  to  dogmatise  as  to  whether  any  other 
additional  methods  of  inference  are,  or  are  not,  open  to  us.  If 
we  define  a  group  as  the  propositions  logically  involved  in  the 
premisses  in  the  above  sense,  and  prescribe  that  the  premisses  of 
an  argument  in  probability  must  specify  a  group  not  lens  extensive 
than  this,  we  are  placing  the  minimum  amount  of  restriction  upon 
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the  form  of  our  premisses.  If,  sometimes  or  as  a  rule,  our 
premisses  in  fact  include  some  more  powerful  principle  of  argu 
ment,  so  much  the  better. 

In  the  formal  rules  of  probability  which  follow,  it  will  be 
postulated  that  the  set  of  propositions,  which  form  the  premiss 
of  any  argument,  must  not  be  inconsistent.  The  premiss  must, 
that  is  to  say,  specify  a  '  group  '  in  the  sense  that  no  part  of  the 
premiss  must  exclude  a  proposition  which  follows  from  another 
part.  But  for  this  purpose  we  do  not  need  to  dogmatise  as  to 
what  the  criterion  is  of  inference  or  certainty. 

7.  It  will  be  convenient  at  this  point  to  define  a  term  which 
expresses  the  relation  converse  to  that  which  exists  between  a 
set  of  propositions  and  the  group  which  they  specify.     The  pro 
positions  Pip2  .  .  .  pn  are  said  to  be  fundamental  to  the  group 
h  if  (i.)  they  themselves  belong  to  the  group  (which  involves  their 
being  consistent  with  one  another) ;    (ii.)  if  between  them  they 
completely  specify  the  group  ;   and  (iii.)  if  none  of  them  belong 
to  the  group  specified  by  the  rest  (for  if  pr  belongs  to  the  group 
specified  by  the  rest,  this  term  is  redundant). 

When  the  fundamental  set  is  uniquely  determined,  a  group  // 
is  a  sub-group  to  the  group  h,  if  the  set  fundamental  to  h'  is 
included  in  the  set  fundamental  to  h. 

Logically  there  can  be  more  than  one  distinct  set  of  proposi 
tions  fundamental  to  a  given  group  ;  and  some  extra-logical  test 
must  be  applied  before  the  fundamental  set  is  determined  uniquely. 
On  the  other  hand,  a  group  is  completely  determined  when  the 
constituent  propositions  of  the  fundamental  set  are  given. 
Further,  any  consistent  set  of  propositions  evidently  specifies 
some  group,  although  such  a  set  may  contain  propositions 
additional  to  those  which  are  fundamental  to  the  group  it  specifies. 
It  is  clear  also  that  only  one  group  can  be  specified  by  a  given 
set  of  consistent  propositions.  The  members  of  a  group  are, 
we  may  say,  rationally  bound  up  with  the  set  of  propositions 
fundamental  to  it. 

8.  If   Mr.    Bertrand    Russell   is   right,    the    whole    of    pure 
mathematics  and  of  formal  logic  follows,  in  the  sense  defined 
above,  from  a  small  number  of  primitive  propositions.      The 
group,  therefore,   which   is   specified   by  these   primitive   pro 
positions,  includes  the  most  remote  deductions  not  only  amongst 
those  known  to  mathematicians,  but  amongst  those  which  time 
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and  skill  have  not  yet  served  to  solve.  If  we  define  certainty 
in  a  logical  and  not  a  psychological  sense,  it  seeing  necessary, 
if  our  premisses  include  the  essential  axioms,  to  regard  as 
certain  all  propositions  which  follow  from  these,  whether  or 
not  they  are  known  to  us.  Yet  it  seems  as  if  there  must 
be  some  logical  sense  in  which  unproved  mathematical 
theorems — some  of  those,  for  instance,  which  deal  with  the 
theory  of  numbers  can  be  likely  or  unlikely,  and  in  which  a 
proposition  of  this  kind,  which  has  been  suggested  to  us  by 
analogy  or  supported  by  induction,  can  possess  an  intermediate 
degree  of  probability. 

There  can  be  no  doubt,  1  think,  that  the  logical  relation  of 
certainty  does  exist  in  these  cases  in  which  lack  of  skill  or  insight 
prevents  our  apprehending  it,  in  spite  of  the  fact  that  sufficient 
premisses,  including  suilicient  logical  principles,  are  known  to  us. 
In  these  cases  we  must  say,  what  we  are  not  permitted  to  say 
when  the  indeterminacy  arises  from  lack  of  premisses,  that  the 
probability  is  unknown.  There  is  still  a  sense,  however,  in  which 
in  such  a  case  the  knowledge  we  actually  possess  can  be,  in  a 
logical  sense,  only  probable.  While  the  relation  of  certainty 
exists  between  the  fundamental  axioms  and  every  mathematical 
hypothesis  (or  its  contradictory),  there  are  other  data  in  relation 
to  which  these  hypotheses  possess  intermediate  degrees  of 
probability.  If  we  arc  unable  through  lack  of  skill  to  discover 
the  relation  of  probability  which  an  hypothesis  does  in  fact  bear 
towards  one  set  of  data,  this  set  is  practically  useless,  and  we  must 
fix  our  attention  on  some  other  set  in  relation  to  which  the  prob 
ability  is  not  unknown.  When  Newton  held  that  the  binomial 
theorem  possessed  for  empirical  reasons  sufficient  probability 
to  warrant  a  further  investigation  of  it,  it  was  not  in  relation  to 
the  axioms  of  mathematics,  whether  he  knew  them  or  not,  that 
the  probability  existed,  but  in  relation  to  his  empirical  evidence 
combined,  perhaps,  with  some  of  the  axioms.  There  is,  in  short, 
an  exception  to  the  rule  that  we  must  always  consider  the  prob 
ability  of  any  conclusion  in  relation  to  the  whole  of  the  data  in 
our  possession.  When  the  relation  of  the  conclusion  to  the  whole 
of  our  evidence  cannot  be  known,  then  we  must  be  guided  by 
its  relation  to  some  part  of  the  evidence.  When,  therefore,  in 
later  chapters  1  speak  of  a  formal  proposition  as  possessing  an 
intermediate  degree  of  probability,  this  will  always  be  in  relation 
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to  evidence  from  which  the  proposition  does  not  logically  follow 
in  the  sense  denned  in  §  6. 

9.  It  follows  from  the  preceding  definitions  that  a  proposition 
is  certain  in  relation  to  a  given  premiss,  or,  in  other  words,  follows 
from  this  premiss  if  it  is  included  in  the  group  which  that  premiss 
specifies.     It  is  impossible  if  it  is  excluded  from  the  group — if, 
that  is  to  say,  its  contradictory  follows  from  the  premiss.     We 
often  say,  somewhat  loosely,  that  two  propositions  are  contra 
dictory  to  one  another,  when  they  are  inconsistent  in  the  sense 
that,  relative  to  our  evidence,  they  cannot  belong  to  the  same 
group.     On  the  other  hand,  a  proposition,  which  is  not  itself 
included  in  the  group  specified  by  the  premiss  and  whose  contra 
dictory  is  not  included  either,  has  in  relation  to  the  premiss  an 
intermediate  degree  of  probability. 

If  a  follows  from  h  and  is,  therefore,  included  in  the  group 
specified  by  h,  this  is  denoted  by  a/h  =  1.  The  relation  of  certainty, 
that  is  to  say,  is  denoted  by  the  symbol  of  unity.  The  reason 
why  this  notation  is  useful  and  has  been  adopted  by  common 
consent  will  appear  when  the  meaning  of  the  product  of  a  pair 
of  relations  of  probability  has  been  explained.  If  we  represent 
the  relation  of  certainty  by  7  and  any  other  probability  by 
a,  the  product  a  .  7  =  a.  Similarly,  if  a  is  excluded  from  the 
group  specified  by  h  and  is  impossible  in  relation  to  it,  this  is 
denoted  by  a/h  =  0.  The  use  of  the  symbol  zero  to  denote 
impossibility  arises  out  of  the  fact  that,  if  &>  denotes  impossibility 
and  a  any  other  relation  of  probability,  then,  in  the  senses  of 
multiplication  and  addition  to  be  defined  later,  the  product 
a  .  &>  =  &>,  and  the  sum  a  +  w  =  a.  Lastly,  if  a  is  not  included 
in  the  group  specified  by  h,  this  is  written  ajh^l  or  a/h<l  ; 
and  if  it  is  not  excluded,  this  is  written  a/h^O  or  a/h>0. 

10.  The  theory  of  groups  now  enables  us  to  give  an  account, 
with  the  aid  of  some  further  conceptions,  of  logical  priority  and 
of  the  true  nature  of  inference.     The  groups,  to  which  we  refer 
the  arguments  by  which  we  actually  reason,  are  not  arbitrarily 
chosen.     They  are  determined  by  those  propositions  of  which 
we  have  direct  knowledge.     Our  group  of  reference  is  specified 
by  those  direct  judgments  in  which  we   personally  rationally 
certify  the  truth  of  some  propositions  and  the  falsity  of  others. 
So  long  as  it  is  undetermined,  or   not   determined   uniquely, 
which  propositions  are  fundamental,  it  is  not  possible  to  discover 
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a  necessary  order  amongst  propositions  or  to  show  in  what  way 
a  true  proposition  '  follows  from  '  one  true  premiss  rather  than 
another.  But  when  we  have  determined  what  propositions  are 
fundamental,  by  selecting  those  which  we  know  directly  to  be  true, 
or  in  some  other  way,  then  a  meaning  can  be  attached  to  priority 
and  to  the  distinction  between  inference  and  implication.  When 
the  propositions  which  we  know  directly  are  given,  there  is  a 
logical  order  amongst  those  other  propositions  which  we  know 
indirectly  and  by  argument. 

11.  It  will  be  useful  to  distinguish  between  those  groups  which 
are  hypothetical  and  those  of  which  the  fundamental  set  is  known 
to  be  true.  We  will  term  the  former  hypothetical  groups,  and  the 
latter  real  groups.  To  the  real  group,  which  contains  all  the 
propositions  which  are  known  to  be  true,  we  may  assign  the  old 
logical  term  Universe  of  Reference.  While  knowledge  is  here 
taken  as  the  criterion  of  a  real  group,  what  follows  will  be  equally 
valid  whatever  criterion  is  taken,  so  long  as  the  fundamental  set 
is  in  some  manner  or  other  determined  uniquely. 

If  it  is  impossible  for  us  to  know  a  proposition  />  except  by 
inference  from  a  knowledge  of  q,  so  that  we  cannot  know  p  to  be 
true  unless  we  already  know  q,  this  may  be  expressed  by  saying 
that  * ;;  requires  q.'  More  precisely  requirement  is  defined  as 
follows  : 

p  docs  not  require  q  if  there  is  some  real  group  to  which  p 
belongs  and  q  does  not  belong,  i.e.  if  there  is  a  real  group  h 
such  that  p/h  =  1,  qjh*  \  ;  hence 

p  requires  q  if  there  is  no  real  group  to  which  p  belongs 
and  q  does  not  belong. 

p  docs  not  require  q  within  the  group  h,  if  the  group  h,  to  which 
p  belongs,  contains  a  subgroup  !  //'  to  which  p  belongs  and  q  does 
not  belong  ;  i.e.  if  there  is  a  group  h'  such  that  h' /h  =  1,  p/h'  =-••  1, 
q/h'^l.  This  reduces  to  the  proposition  next  but  one  above 
if  h  is  the  I  "inverse  of  Reference.  In  §  13  these  definitions 
will  be  generalised  to  cover  intermediate  degrees  of  prob 
ability. 

12.  Inference  and  logical  priority  can  be  defined  in  terms  of 
requirement  and  real  groups.  It  is  convenient  to  distinguish 
two  types  of  inference  corresponding  to  hypothetical  and  real 

1  Subgroups  have  only  IKHMI  defined,  it  muni  l>e  noticed  (see  $  7  above)  when 
tho  fundamental  set  of  the  yroup  has  been,  in  some  w.iy,  uniquely  determined. 
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groups — jfSf  to  cases  where  the  argument  is  only  hypothetical, 
and  cases  where  the  conclusion  can  be  asserted  : 

Hypothetical  Inference. — '  If  p,  q,'  which  may  also  be  read 
'  q  is  hypothetically  inferrible  from  p,'  means  that  there  is  a 
real  group  h  such  that  q/ph  =  l,  and  q/h^l.  In  order  that  this 
may  be  the  case,  ph  must  specify  a  group  ;  i.e.  p/h^Q,  or  in 
other  words  p  must  not  be  excluded  from  h.  Hypothetical 
inference  is  also  equivalent  to  :  '  p  implies  q,'  and  '  p  implies 
q '  does  not  require  '  q.'  In  other  words,  q  is  hypothetically 
inferrible  from  p}  if  we  know  that  q  is  true  or  p  is  false  and  if 
we  can  know  this  without  first  knowing  either  that  q  is  true  or 
that  p  is  false. 

Assertoric  Inference. — '  p  .'.  q,''  which  may  be  read  '  p  therefore 
q  '  or  '  q  may  be  asserted  by  inference  from  p,'  means  that '  if  p,  q  ' 
is  true,  and  in  addition  '  p  '  belongs  to  a  real  group  ;  i.e.  there 
are  proper  groups  h  and  h'  such  that  p/h  =  lt  q/ph'  ==1>  q/h'^^-, 
and  p/hf  4=  0. 

p  is  prior  to  q  when  p  does  not  require  q,  and  q  requires  p, 
when,  that  is  to  say,  we  can  know  p  without  knowing  q,  but 
not  q  unless  we  first  know  p. 

p  is  prior  to  q  within  the  group  h  when  p  does  not  require  q 
within  the  group,  and  q  does  require  p  within  the  group. 

It  follows  from  this  and  from  the  preceding  definitions  that, 
if  a  proposition  is  fundamental  in  the  sense  that  we  can  only 
know  it  directly,  there  is  no  proposition  prior  to  it ;  and,  more 
generally,  that,  if  a  proposition  is  fundamental  to  a  given 
group,  there  is  no  proposition  prior  to  it  within  the  group. 

13.  We  can  now  apply  the  conception  of  requirement  to 
intermediate  degrees  of  probability.  The  notation  adopted  is, 
it  will  be  remembered,  as  follows  : 

p/h  =  a  means  that  the  proposition  p  has  the  probable  relation 
of  degree  a  to  the  proposition  h  ;  while  it  is  postulated  that  h  is 
self-consistent  and  therefore  specifies  a  group. 

p/h  =  l  means  that  p  follows  from  h  and  is,  therefore,  in 
cluded  in  the  group  specified  by  h. 

p/h  =0  means  that  p  is  excluded  from  the  group  specified  by  /*. 
If  h  specifies  the  Universe  of  Reference,  i.e.  if  its  group  com 
prehends  the  whole  of  our  knowledge,  p/h  is  called  the  absolute 
probability  of  p,  or  (for  short)  the  probability  of  p  ;   and  if  p/h  =  1 
and  h  specifies  any  real  group,  p  is  said  to  be  absolutely  certain 
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or  (for  short)  certain.  Thus  />  is  '  certain  '  if  it  is  a  member  of  a 
real  group,  and  a  '  certain  '  proposition  is  one  which  we  know 
to  be  true.  Similarly  if  p/h=()  under  the  same  conditions,  p  is 
absolutely  impossible,  or  (for  short)  impossible.  Thus  an  l  im 
possible  '  proposition  is  one  which  we  know  to  be  false. 

The  definition  of  requirement,  when  it  is  generalised  so  as  to 
take  account  of  intermediate  degrees  of  probability,  becomes,  it 
will  be  seen,  equivalent  to  that  of  relevance  : 

The  probability  of  p  docs  not  require  q  within  the  group  h,  if 
there  is  a  subgroup  //  such  that,  for  every  subgroup  h"  which 
includes//  and  is  included  in//  (i.e..  h'  Ik"  =  l,h"/h  =  \.),p/h"  =  plhf, 
and  q/h'  •*  q/h. 

When  p  is  included  in  the  group  h.  this  definition  reduces  to 
the  definition  of  requirement  given  in  §  11. 

14.  Th<i.  importance  of  the  theory  of  groups  arises  as  soon  as 
we  admit  that  there  are  some  propositions  which  we  take  for 
granted  without  argument,  and  that  all  arguments,  whether 
demonstrative  or  probable,  consist  in  the  relating  of  other  con 
clusions  to  these  as  premisses. 

The  particular  propositions,  which  are  in  fact  fundamental 
to  the  Universe;  of  Reference,  vary  from  time  to  time  and  from 
person  to  person.  Our  theory  must  also  be  applicable  to  hypo 
thetical  Universes.  Although  a  particular  Universe  of  Reference 
may  be  defined  by  considerations  which  are  partly  psychological, 
when  once  the  Universe  is  given,  our  theory  of  the,  relation  in 
which  other  propositions  stand  towards  it  is  entirely  logical. 

The  formal  development  of  the  theory  of  argument  from 
imposed  and  limited  premisses,  which  is  attempted  in  thefollowing 
chapters,  resembles  in  its  general  method  other  parts  of  formal 
logic.  \Ve  seek  to  establish  implications  between  our  primitive 
axioms  and  the  derivative  propositions,  without  specific  reference 
to  what  particular  propositions  are  fundamental  in  our  actual 
Universe  of  Reference. 

It  will  be,  seen  more  clearly  in  the  following  chapters  that  the 
laws  of  inference  are  the  laws  of  probability,  and  that  the  former 
is  a  particular  case  of  the  latter.  The  relation  of  a  proposition  to 
a  group  depends  upon  tin;  relevance  to  it  of  the  group,  and  a 
group  is  relevant  in  so  far  as  it  contains  a  necessary  or  sufficient 
condition  of  the  proposition,  or  a  necessary  or  sufficient  condition 
of  a  necessary  or  sufficient  condition,  and  so  on  ;  a  condition 
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being  necessary  if  every  hypothetical  group,  which  includes  the 
proposition  together  with  the  Universe  of  Reference,  includes 
the  condition,  and  sufficient  if  every  hypothetical  group,  which 
includes  the  condition  together  with  the  Universe  of  Reference, 
includes  the  proposition. 


CHAPTER   XII 

THE    DEFINITIONS    AND    AXIOMS    OF    INFERENCE    AND 
PROBABILITY 

1.  IT  is  not  necessary  for  the  validity  of  what  follows  to  decide 
in  what  manner  the  set  of  propositions  is  determined,  which  is 
fundamental  to  our  Universe  of  Reference,  or  to  make  definite 
assumptions  as  to  what  propositions  are  included  in  the  group 
which  is  specified  by  the  data.  When  we  are  investigating  an 
empirical  problem,  it  will  be  natural  to  include  the  whole  of 
our  logical  apparatus,  the  whole  body,  that  is  to  say,  of 
formal  truths  which  an;  known  to  us,  together  with  that  part 
of  our  empirical  knowledge  which  is  relevant.  But  in  the 
following  formal  developments,  which  are  designed  to  display 
the  logical  rules  of  probability  we  need  only  assume  that  our  data 
always  include  those  logical  rules,  of  which  the  steps  of  our 
proofs  are  instances,  together  with  the.  axioms  relating  to  prob 
ability  which  we  shall  enunciate. 

Tin-  object  of  this  and  the  chapters  immediately  following  is 
to  show  that  all  the.  usually  assumed  conclusions  in  the  funda 
mental  logic  ol  inference  and  probability  follow  rigorously  from 
a  few  axioms,  in  accordance  with  the  fundamental  conceptions 
expounded  in  Part  I.  This  body  of  axioms  and  theorems 
corresponds,  I  think,  to  what  logicians  have  termed  the  Laws  of 
Thought,  when  they  have  meant  by  this  something  narrower  than 
the  whole  system  of  formal  truth.  But  it  goes  beyond  what  has 
been  usual,  in  dealing  at  the  same  time  with  t  he  laws  of  probable. 
as  well  as  of  necessarv,  inference. 

2.  This  and  the  following  chapters  of  Part  II.  are  largelv 
independent  of  many  of  the  more  controversial  issues  raised  in 
the  preceding  chapters.  They  do  not  prejudge  the  question  as 
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to  whether  or  not  all  probabilities  are  theoretically  measurable  ; 
and  they  are  not  dependent  on  our  theories  as  to  the  part  played 
by  direct  judgment  in  establishing  relations  of  probability  or 
inference  between  particular  propositions.  Their  premisses  are 
all  hypothetical.  Given  the  existence  of  certain  relations  of 
probability,  others  are  inferred.  Of  the  conclusions  of  Chapter 
III.,  of  the  criteria  of  equiprobability  and  of  inequality  discussed 
in  Chapters  IV.  and  V.,  and  of  the  criteria  of  inference  discussed 
in  §§  5,  6  of  Chapter  XI.,  they  are,  I  think,  wholly  independent. 
They  deal  with  a  different  part  of  the  subject,  not  so  closely 
connected  with  epistemology. 

3.  In  this  chapter  I  confine  myself  to  Definitions  and  Axioms. 
Propositions  will  be  denoted  by  small  letters,  and  relations 

by  capital  letters.  In  accordance  with  common  usage,  a  dis 
junctive  combination  of  propositions  is  represented  by  the  sign 
of  addition,  and  a  conjunctive  combination  by  simple  juxta 
position  (or,  where  it  is  necessary  for  clearness,  by  the  sign  of 
multiplication)  :  e.g.  '  a  or  b  or  c  '  is  written  '  a  +  b  +  c,'  and  '  a 
and  b  and  c  '  is  written  '  abc.'  '  a  +  b  '  is  not  so  interpreted  as  to 
exclude  '  a  and  &.'  The  contradictory  of  a  is  written  a. 

4.  Preliminary  Definitions  : 

I.  If  there  exists  a  relation  of  probability  P  between  the 
proposition  a  and  the  premiss  h 

a/h=P  Def. 

II.  If  P  is  the  relation  of  certainty  ] 

P=l  Dcf. 

III.  If  P  is  the  relation  of  impossibility  l 

P=0  Def. 

IV.  If  P  is  a  relation  of  probability,  but  not  the  relation  of 
certainty  P<1.  Dcf. 

V.  If  P  is  a  relation  of  probability,  but  not  the  relation  of 
impossibility  P>0.  Def. 

VI.  If  a/h-O,  the  conjunction  ah  is  inconsistent.          Def. 

VII.  The  class   of   propositions  a  such   that  a/h=-l   is  the 
group  specified  by  h  or  (for  short)  the  group  h.  Def. 

VIII.  If  b/ah  =  1  and  a/bh  =  1,  (a=b)Jh  - 1 .  Def. 
This  may  be  regarded  as  the  definition  of  Equivalence.     Thus 

we  see  that  equivalence,  is  relative  to  a  premiss  h.     a  is  equivalent 
to  b,  given  h,  if  b  follows  from  ah,  and  a  from  bh. 

1  These  symbols  were  liiot  employed  by  Leibnitz.     See  p.  155  below. 
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5.  Preliminary  Axioms : 

We  shall  assume  that  there  is  included  in  every  premiss  with 
which  we  are  concerned  the  formal  implications  which  allow  us 
to  assert  the  following  axioms  : 

(i.)  Provided  that  a  and  k  are  propositions  or  conjunctions 
of  propositions  or  disjunctions  of  propositions,  and  that  h  is  not 
an  inconsistent  conjunction,  there  exists  one  and  only  one  rela 
tion  of  probability  P  between  a  as  conclusion  and  h  as  premiss. 
Thus  any  conclusion  a  bears  to  any  consistent  premiss  h  one  and 
only  one  relation  of  probability. 

(ii.)  If  (n^b)/h  =  }.  and  x  is  a  proposition,  x/ah=x/bh.  This 
is  the  Axiom  of  Equivalence. 

(iii.)  ('/  r!>=au)jh  =  1 

(«a=a)//i  =  l 

(fl-=.:a)/h  =  1 

(ab  +  ab=b)/h  =  l. 

If  a/It  =  ],«/<  =  //.     That  is  to  say, 

if   a   is   included   in   the   group   specified    by   //,   h   and   ah    are 
equivalent. 

6.  Addition  and  Multiplication.— Ii  we  were  to  assume  that 
probabilities  are  numbers  or  ratios,  these  operations  could   be 
idven    their    usual    arithmetical    signification.     In    adding    or 
multiplying  probabilities  \\e  should  be  simply  adding  or  multi 
plying  numbers.      But  in  the  absence  of  such  an  assumption,  it 
is  neccssarv  to  give  a  meaning  by  definition  to  these  processes. 
I    shall   define   the   addition   and   multiplication   of   relations   of 
probabilities  only   for  certain  types  of  such   relations.     But    it 
will  be  shown  later  that  the  limitation  thus  placed  on  our  opera 
tions  is  not  of  practical  importance. 

We  define  the  sum  of  the  probable  relations  ab/h  and  aB/h 
as  being  the  probable  relation  a/h  ;  and  the  product  of  the  probable 
relations  a'bh  and  b/h  as  being  the  probable  relation  abjh.  That 
is  to  say  : 

IX.  ab/h+afyh^ajh.  1M. 

X.  ab/lt   -a/bh.b/h^b/ah.a/h.  JM. 

Before  we  proceed  to  the  axioms  which  will  make  these  sym 
bols  operative,  the  definitions  may  be  restated  in  more  familiar 
language.  IX.  may  be  read:  "The  sum  of  the  probabilities 
of  'both  a  and  h'  and  of  'a  but  not  fc,'  relative  to  the  same 
hypothesis,  is  equal  to  the  probability  of  'a'  relative  to  this  hypo- 
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thesis."  X.  may  be  read  :  "  The  probability  of  *  both  a  and  6,' 
assuming  h,  is  equal  to  the  product  of  the  probability  of  6,  assum 
ing  h,  and  the  probability  of  a,  assuming  both  b  and  h."  Or  in 
the  current  terminology J  we  should  have  :  "  The  probability 
that  both  of  two  events  will  occur  is  equal  to  the  probability  of 
the  first  multiplied  by  the  probability  of  the  second,  assuming 
the  occurrence  of  the  first."  It  is,  in  fact,  the  ordinary  rule  for 
the  multiplication  of  the  probabilities  of  events  which  are  not 
'  independent.'  It  has,  however,  a  much  more  central  position 
in  the  development  of  the  theory  than  has  been  usually  recognised. 
Subtraction  and  division  are,  of  course,  defined  as  the  inverse 
operations  of  addition  and  multiplication  : 

xi.  if  PQ=R,P  =  O.  Def. 

XII.  If  P  +  Q=R,  P=R-Q.  Def. 

Thus  we  have  to  introduce  as  definitions  what  would  be  axioms 
if  the  meaning  of  addition  and  multiplication  were  already  defined. 
In  this  latter  case  we  should  have  been  able  to  apply  the  ordinary 
processes  of  addition  and  multiplication  without  any  further 
axioms.  As  it  is,  we  need  axioms  in  order  to  make  these  symbols, 
to  which  we  have  given  our  own  meaning,  operative.  When 
certain  properties  are  associated,  it  is  often  more  or  less  arbitrary 
which  we  take  as  defining  properties  and  which  we  associate 
with  these  by  means  of  axioms.  In  this  case  I  have  found  it 
more  convenient,  for  the  purposes  of  formal  development,  to 
reverse  the  arrangement  which  would  come  most  natural  to 
commonsense.  full  of  preconceptions  as  to  the  meaning  of  addition 
and  multiplication.  I  define  these  processes,  for  the  theory  of 
probability,  by  reference  to  a  comparatively  unfamiliar  property, 
and  associate  the  more  familiar  properties  with  this  one  by  means 
of  axioms.  These  axioms  are  as  follows  : 

(iv.)  If  P,  Q,  R  are  relations  of  probability  such  that  the 
products  PQ,  PR  and  the  sums  P  +  Q,  P  +  R  exist,  then  : 

(iv.  «)  If  PQ  exists,  QP  exists,  and  PQ  -  QP.  If  P  +  Q  exists, 
Q  +  P  exists  and  P  +  Q  =  Q+P. 

(iv.fc)    PQ<P  unless  Q  =  l  or  P=0;   P  +  Q>P  unless  Q  =  0. 
PQ  =  P     if      Q  =  l  or  P  =  0;  P  +  Q=P      if      Q=0. 

(iv.c)  If  PQ  ;PR,  then  Q  R  unless  P=0.  If  P  +  Q^P  +  R, 
then  Q^R  and  conversely. 

1  E.g.  Bertrand,  Calcal  des  probabilites,  p.  2(>. 
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A  meaning  has  not  been  given,  it  is  important  to  notice,  to 
the  signs  of  addition  and  multiplication  between  probabilities 
in  all  a.ises.  According  to  the  definitions  we  have  given,  P  +  Q 
and  PQ  have  not  an  interpretation  whenever  P  and  Q  are 
relations  of  probability,  but  in  certain  conditions  only.  Further 
more,  if  P  j  Q  -  K  and  Q  S  +  T,  it  does  not  follow  that 
P+S+T  R,  since  no  meaning  has  been  assigned  to  such  an 
expression  as  P  +  8  +  T.  The  equation  must  be  written  P  +  (8  +  T) 
=  R.  and  we  cannot  infer  from  the  foregoing  axioms  that 
(P-fS)  -r-T  It.  The  following  axioms  allow  us  to  make  this 
and  other  inferences  in  cases  in  which  the  sum  P  i  -8  exists,  i.e. 
when  P  +  8  -•=  A  and  A  is  a  relation  of  probability. 

(v.)   [±P±Q]  +  [±R±S]=[±P±R]-[=FQ=FS]-[±P±R]  + 


in  every  case  in  which  the  probabilities  [±P±Q],  [±R±S], 
[rP±R],  etc.,  exist,  i.e.,  in  which  these  sums  satisfy  the  con 
ditions  necessary  in  order  that  a  meaning  may  be  given  to  them 
in  the  terms  of  our  definition. 

(vi.)  P(RiS)     PR±PS,  if   tin-  sum   R  ::S  and   the   products 
PR  and  PS  exist  us  probabilities. 

7.  From  these  axioms  it  is  possible  to  derive  a  number  of 
propositions  respecting  the  addition  and  multiplication  of  prob 
abilities.  They  enable  us  to  prove,  for  instance,  that  if  P  -t-  Q  - 
R+8  then  P-R=S-Q,  provided  that  the  differences  PR 
and  8  Q  exist  ;  and  that  (P  +  Q)  (R  +  8)  =  (P  +  Q)R  4  (P  •  Q)S  = 
[PR  ^  QR  ]  +  [  PS  +  QS  )  =  |  PR  4  Q8  ]  +  |  QR  +  PS  |,  provided  that 
the  sums  and  products  in  question  exist.  In  general  any  re 
arrangement  which  would  be  legitimate  in  an  equation  between 
arithmetic  quantities  is  also  legitimate  in  an  equation  between 
probabilities,  provided  that  our  initial  equation  and  the  equation 
which  finally  results  from  our  symbolic  operations  can  both  l>e 
expressed  in  a  form  which  contains  only  products  and  sums  which 
have  an  interpretation  as  probabilities  in  accordance  with  the 
definitions.  If.  therefore,  this  condition  is  observed,  we  need  not 
complicate  our  operations  by  the  insertion  of  brackets  at  every 
stage,  and  no  result  can  be  obtained  as  a  result  of  leaving  them 
out,  if  it  is  of  the  form  prescribed  above,  which  could  not  be 
obtained  if  they  had  been  rigorouslv  inserted  throughout.  \Ve 
can  only  be  interested  in  our  final  results  when  they  deal  with 
actually  existent  and  intelligible  probabilities-  for  our  object  is, 
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always,  to  compare  one  probability  with  another — and  we  are 
not  incommoded,  therefore,  in  our  symbolic  operations  by  the 
circumstance  that   sums  and  products  do  not   exist  between 
every  pair  of  probabilities. 
8.     Independence : 

XIII.  If    a1/aji  =  a1//i    and    a^aji  =  a2/h,    the    probabilities 
ajh  and  ajh  are  independent.  Def. 

Thus  the  probabilities  of  two  arguments  having  the  same 
premisses  are  independent,  if  the  addition  to  the  premisses  of  the 
conclusion  of  either  leaves  them  unaffected. 

Irrelevance : l 

XIV.  If  a1/a2h  =  ajh,   a2  is  irrelevant  on  the  whole,  or,  for 
short,  irrelevant  to  ajh.  Def. 

1  This  is  repeated  for  convenience  of  reference  from  Chapter  IV.  <  14.  It  is 
only  necessary  here  to  take  account  of  irrelevance  on  the  whole,  not  of  the  more 
precise  sense. 


CHAPTER    XIII 

THE    FUNDAMENTAL   THEOREMS    OF    NECESSARY    INFERENCE 

1.  IN  this  chapter  we  shall  be  mainly  concerned  with  deducing 
the  existence  of  relations  of  certainty  or  impossibility,  given  other 
relations  of  certainty  or  impossibility, — with  the  rules,  that  is  to 
say,  of  Certain  or,  as  De  Morgan  termed  it,  of  Necessary  Inference. 
But  it  will  be  convenient  to  include  here  a  few  theorems  dealing 
with  intermediate  degrees  of  probability.  Except  in  one  or  two 
important  cases  1  shall  not  trouble  to  translate  these  theorems 
from  the.  svmbolism  in  which  they  are  expressed;  since  their 
interpretation  presents  no  difficulty. 

2.    (1)   o.Un-djh   =1. 

For                                      abjh  +  db/fi  =  (j/lt  by  IX., 

a/bit  .  h/li  +  d/bh  .  b/h  =  b/h  by  X. 

Put                            !>//>  -  1 ,  then  ,///,//.  +a/M  -  1  by  (iv.  b), 

since                                     h/h  —  1,  />//•=//.  by  (iii.). 

Thus                                     «//i,+dl/t  =  \  by  (ii.). 

(l.i)    If  ,////  =  1,  «///=(), 

>i//l+a/h=\  by  (1). 

.-.  «l!i  +  d//i  =  ('///  =  "///  +  <  >  by  (iv.  />). 

/.  dlli  =0  by  (iv.  c). 
Similarly,  if  dj/i  —  1,  c//i  =0. 

if  /////='n,  d//i  =  i, 

/////  i  a//,  =1 

.-.  o  fa/A  -0  +  1 

/.  (7///  =  1 

(l.lj   Similarly,  if  a//>  -0,  ,,\h  -1. 
(-2)  "///<!  or  <////--=! 

(3)  /////>()  or  /////  -0 

.    th.-re  jin*  no  negativi!  probabilities. 
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(4)  abpKbpi  or  ab/k  =  A/A     by  X.  and  (iv.  6). 

(5)  If  P  and  Q  are  relations   of  probability  and   P  +  Q-0, 
then  P=0  and  Q=0. 

P  +  Q>P  unless  Q  =0  by  (iv.  ft), 

and  P>0  unless  P  =  0  by  V. 

/.  P  +  Q>0  unless  Q=0. 
Hence,  if  P  +  Q=0,  Q  =  0  and  similarly  P=0. 

(6)  If  PQ=0,  P=0  or  Q=0, 

Q>0  unless  Q=0  by  V. 

Hence  PQ>P  .  0  unless  Q  -0  or  P  =  0        by  (iv.  c), 

i.e.  PQ>0  unless  Q=0  or  P-0          by  (iv.  ft). 

Whence,  if  PQ=0,  the  result  follows. 

(7)  If  PQ  =  1,  P  =  l  and  Q  =  l, 

PQ  <  P  unless  P  =  0  or  Q  =  .1  by  (iv.  6), 

PQ  =P  if  P -0  or  Q  =  1  by  (iv.  ft), 

and  P  <  1  unless  P  -  1  by  IV., 

/.  PQ<1  unless  P  =  l. 
Hence  P  =  1  ;   similarly  Q  =  1. 

(8)  If    «//*=  0,    «.bpi=0   and   a/b/t  =Q   if   ft//    is    not    incon 
sistent. 

For  (ib/h  =  b/rih  .  a/h,  = rt/bh  .  ft/A  by  X., 

and  since  apt  =0,  h/a/t  .  ajh  =0  by  (iv.  ft), 

/.  <(.b/k  =  0  and  d/l/i..  ft///=0, 

/.  unless  ft///  =0,  ,//ft//.  =  0  by  (5), 

whence  the  result  by  VI. 

Thus,  if  a  conclusion  is  impossible,  we  may  add  to  the  con 
clusion  or  add  consistently  to  the  premisses  without  affecting  the 
argument. 

(9)  If  a/h  =  l,  «/ft/t  =  l  if  bh  is  not  inconsistent. 

Since  a/h.  =  1 ,                             d/h  =0  by  ( 1 . 1 ) , 

.*.  «/ft/t=0  by  (8)  if  ftA  is  not  inconsistent, 

whence                                      ((//>/!=}  by  (I.-*). 

Thus  we  may  add  to  premisses,  which  make  a  conclusion 
certain,  any  other  premisses  not  inconsistent  with  them,  without 
affecting  the  result. 

(10)  If  "//t  =  l,  ab//,  =/,/„/<,  =  £//,., 

al,//i,  =  Itjnh.  .  «///  =  ,i/l>/t, .  Ijh.  by  X. 

Since  «/A  =  ],  a/bh  =  l  by  (9)  unless  ft///  =0, 

/.  b/ah  .  «/k  -  b/ah  and  a/bh  .  ft///  =  b/k     by  (i v.  ft),, 
whence  tlui  result,  unless  ft/A  =-0. 
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If  /////  -0.  the  result  follows  from  (8). 

(11)    If  „/,//,  =1,  «//,-!. 
For  „/,//,=/,/„/,  .  „//,  by  x 

/.  nji,  =  1  by  (7). 

(1-J)   If  («  =  !,)//,=  L<>//<  =!,//, . 

'l>lak.alk=«ll,h.blh  by  X. 

and  /,/„/,  =1,  ,/        =1  by  VIII.. 

•'•  "//'  =''//'  l>y  (iv.6). 

(l-.l)    If  ("  =  //)///  =  1  and  //./:  is  not  inconsistent. 

al/ix  =  b/Jw. 
a/Irs  .  vjlt  =.>•/(,/,  .  ,,//,? 

and  b//ix.x//t  =u.'/l,h  .f'/h  by  X., 

•'/"/'  =.'•/''//  by  (ii.). 

and  '////=/*///  },y  (12). 

.-.   fijkj:  =  hlh'j-  unless  ./'/A  =0. 

This  is  the  principle  of  equivalence.  In  virtue  of  it  and  of 
axiom  (ii.),  if  (a-b)/h  =  1,  \vc  can  substitute  </  for  6  and  r/cv  i-ersa, 
wherever  they  occur  in  a  probability  whose  premisses  include  h. 

(Ki)    n/a  =  ].,  unless  '/  is  inconsistent. 

^or  "/"  ---«"/<>  =a/a<i  .  a ja     by  (iii.),  (li>),  and  X., 

whence  u/<«(  =  \  by  (ii.);  unless  f//</=0, 
i.«'.        <////  =  !,  unl'-ss  r/  is  inconsistent  by  (iii.),  (12).  and  VI. 

(13.1)  d/<i=0,   unless   r/   is   inconsistent.      This  follows   from 
(lo)  and  (1.1). 

(13.2)  t/l'd  =  (\i  unless   (7   is   inconsistent.      This  follows   from 
(iii.)  by  writing  d  for  n  in  (13.1). 

(M)   If  ^///-U  and  ^/  is  not  inconsist<'iit,  hju  =0. 

Let /be  the  group  of  assumptions,  common  to  a  and  6,  which 
we  have  supposed  to  be  included  in  every  real  group  ; 
tli'-n  "///  =  ,///,/  and  l>/a  =  ll/nf       by  (iii.)  and  (1*2), 

and  «/(/ •''//=  ''A'/-"//  l>y  X. 

Since  „//,/       ()   by   hypothesis. 

an(i  ////*^-  8nirr  "  JK  'i°t  inc-onsistent. 

/.    A////-0, 

whence  A/,,  =  0. 

Thus,  if  «  is  impossible  Driven  ^,  then  />  is  impossible  given  r/. 
(1.ri)    If  /(1//,2     0,  A/jj/yi^o, 

//J//2///      -//,///2//    .  //2///  by    11., 

and   sine,-    Aj//^-  o,  /, J /. 2/,  . •  (}   \>y   (s),   Un]r8s   /////2-0,  whence 
the  result  by  (iv.  />).  unless  /////2=0. 
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If  7/77/2  =  0,  //2/A=0  by  (14), 

since  we  assume  that  //  is  not  inconsistent,  and  hence 

/,/2///=0  by  (8). 

Thus,  if  h-L  is  impossible  given  hz,  h^  is  always  impossible  and  is 
excluded  from  every  group. 

(15.1)   If  7^2/^=0  and  hzh  is  not  inconsistent,  7/1/7/27i=0. 
This,  which  is  the  converse  of  (15),  follows  Irom  X.  and  (6). 
(16)   If  7^-2  =  1,  (//1  +  A2)/A  =  1, 

7^  =  0  by(l), 

.-.  ^2///=0  by  (15), 

.-.  771//.2///-l  by  (1.3), 

.-.  (/l,1  +  rtz)/k  =  l       by  (12)  and  (iii.). 

(16.1)  We  may  write  (16)  : 

If  hjjhz  =  l,  (h,2  ^h1)/h  =  l,  where  '  :>'  symbolises  <  implies.' 
Thus  if  &!  follows  from  7?2,  then  it  is  always  certain  that 
h2  implies  hv 

(16.2)  If     (7t1  +  /72)/7i  =  1     and     7t27?.     is     not     inconsistent. 


0,  as  in  (16), 
^Q  by  (15.1),  since  A27/.  is  not  inconsistent, 

/.  hjhji^l  by  (1.4). 

This  is  the  converse  of  (14). 
(16.3)  We  may  write  (16.2)  : 

If  (hz  ^7i1)/7/=l  and  7/27/-  is  not  inconsistent,  7/j/7/27i.  =  1. 
Thus,  if  we  define  a  '  group  '  as  a  set  of  propositions,  which  follow 
from  and  are  certain  relatively  to  the  proposition  which  specifies 
them,  this  proposition  proves  that,  if  h.2  3  7ix  and  7*2  belong  to  a 
group  7i27i,  then  7ix  also  belongs  to  this  group. 

(17)  If  (//!  ^:a=b)/h  =  l  and  h^lt-  is  not  inconsistent,  it/h^k 
=  i/7fc17/,.     This  follows  from  (16.3)  and  (12). 

(18)  a/a  =  l  or  a/a  =  l. 

a/a  =  l,  unless  ^  is  inconsistent,          by  (13). 
If  a  is  inconsistent,  a/h  =  0,  where  A  is  not  inconsistent,  and 
therefore  a/7/=l  by  (1.3). 

Thus  unless  a  is  inconsistent,  a  is  not  inconsistent,  and  therefore 

a/5=l  by  (13). 

(19)  wa/7/-0, 

o/o  =  l  or  r//a     1  by  (18), 

...  r//a  =  0  or  a/r/-0      by  (l.l)  and  (1.2). 
In  either  case  r/a/7i  =0  by  (15). 
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Thus   it    is    impossible  that   both   n    and    its   contradictory 
should  be  true.       This  is  the  Jjjiw  of  Contradiction. 


Since  (ad=a  +  d)/k  =  l  by  (iii.). 

a+d/h  =  0  by  (19)  and  (12), 

.-.    (n+d)//i  =1  by  (1.3). 

Thus  it  is  certain  that  either  a  or  its  contradictory  is  true.     This 
is  the  Law  of  Excluded  Middle. 

(21)  If  <////!  =  !  and  «//i2  =  (),  h^^jh  =0. 

For  «//<!^2  '  f'l'f'z  =  /'!/"/'  2  '  "/7'2' 

and  d//i  Ji  2  .  //  2  '//  1  =  //  2/57*  x  .  5/7/  x  by  X.  , 

.-.  a/y/^/2  .  //,///  2  =0  and  d//^//  2  .  //  2/^1  =  0, 
since,  by  hypothesis  and  (1).  d///1=0  and  ^///2  =  (), 

.-.    ,(///1//2  =  o  or  A1//<2=0, 
and  a///1//2-l  or  //oM^O, 

/.    //  ^7/2  =  0  or  //2/y/^O. 
In  either  case  ///^M  -0  by  (15). 

Thus,  if  a  proposition  is  certain  relatively  to  one  set  of 
premisses,  and  impossible  relatively  to  another  set,  the  two  sets 
are  incompatible. 

(22)  If  ff/7/^0  and  /V//  =1,  ,//76-0, 

«7/j///  -0  by  (15),    .-.  7/  !/////  .  «//t.=Q, 

7/J///A  -1  by  (9),  unless  ^//7?.-0. 

/.   in  any  case  />///.  -0. 

(23)  If  />/'«  =  0  and  /Va--0,  /'M  -0. 

al>  jit  -0  and  (7  A/7/  =  0  by  (15), 
/.  <i/!,/,  =0  or  A/7/  -0, 

and                                   a/'AA  =  0  or  //,//  =0  by  II.  and  (iv.), 

whence                                          A/7/  .=0  by  (1.4). 


CHAPTER  XIV 

THE    FUNDAMENTAL   THEOREMS    OF   PROBABLE    INFERENCE 

1.  I  SHALL  give  proofs  in  this  chapter  of  most  of  the  fundamental 
theorems  of  Probability,  with  very  little  comment.  The  bearing 
of  some  of  them  will  be  discussed  more  fully  in  Chapter  XVI. 

2.  The  Addition  Theorems  : 

(24)    (a  + 1) Ik  =  a/h  +  b//<  -  ab/h. 

In  IX.  write  (a  +  b)  for  a,  and  ab  for  I. 
Then  (a  +  b)db/h  +  (a  +  b)db/h  =  (a  +  b)/h, 

whence  db/h  +  (a  +  b)(a  +  B)/h  =  (a  +  b)/h  by  (iii.), 

d/bh .  b/h  +  a/h  =  (a  + 1)1  It,       by  (iii.)  and  IX. 
That  is  to  say,  (a  +  b}/h  =  a/h  +  (1  -  a/bk)  .  bjh, 
=  a/h  +  b/h  -  abjh . 

In  accordance  with  the  principles  of  Chapter  XII.  §  6;  this 
should  be  written,  strictly,  in  the  form  a/h  +  (b/h  -  abfh),  or  in 
the  form  b/h  +  (a/h  -ab/h).  The  argument  is  valid,  since  the 
probability  (b/h  -ab/h}  is  equal  to  db/h,  as  appears  from  the 
preceding  proof,  and,  therefore,  exists.  This  important  theorem 
gives  the  probability  of  '  a  or  b  '  relative  to  a  given  hypothesis 
in  terms  of  the  probabilities  of  X'  'V  and  'a  and  b'  relative  to 
the  same  hypothesis. 

(24.1)  If  ab/h  =  0,  i.e.  if  a  and  b  are  exclusive  alternatives 
relative  to  the  hypothesis,  then 

(a  +  b)/h  =  alh  +  b/h. 

This  is  the  ordinary  rule  for  the  addition  of  the  probabilities  of 
exclusive  alternatives. 

(24.2)  ab//t,+db/h  =  b//<, 

since  ab+db=b  by  (iii.), 

aud  aal>/k=Q  by  (19)  and  (8). 

(24.3)  (a  +  b)/h  =  afh  +  l>d/h.       This  follows    from    (24)    and 
(24.2). 
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(2  1.4)    („  +b+  c.)//i  =  (a  +  l)j  I,  +  ,•///  -  (<ic  +  /*•)///. 

-  a//t  +  l'tii  +  <•//,  -  aljlt  -  bi'lfi.  -  calk  +  abc/h  . 
(24.01   And  in  e;eueral 
(l'i  +l'2  +  ->  -  /'„)/'/<  -  -!>.[>>!  1'  +  -/WV',//>  •  •  . 

+  (-D'   Vi/v--  /'-,//<• 

(24.6)  If  ;;,;;,//*     0  for  all  pairs  of  values  of  s  and  /,  it  follows 
by  repeated  ap])lication  of  X.  that 


(24.7)  If  /:  /y  =0,  etc-.,  and  (/^  +/'2  +  .  .  .  +/>„)//<  =  1,  i.e. 
if  Pii>2'''j'n  fonn,  relatively  to//,  a  set  of  exclusive  and 
exhaustive  alternatives,  then 


(25)  If   /^/>2  .  .  .  p     form,  relative    to   A,   a   set   of    exclusive 
and  exhaustive  alternatives, 


i 

Sinc-e  (/^  +/'.,  +  .  .  .  +/'„)//<  -  :  1  l>y  hypothesis, 

•'•  (/  1+/'2  +  -  •  •  ••/'/,  )/'^'  =  1  ^7  (-J)  ^  '^'-  is  n°t  inconsistent  ; 
and  since  />  />,///  -0  }>y  hypothesis. 

•'•  /',/v'"/'  :  ()  ^.v  l^)'  if  "/'  is  n°t  inconsistent. 

H«:nce  -/'..-/'^'  =  (/;i  +  /'2  +  •  •  •  +j>  „)!«}'  by  (2-4.6) 


Also 
Summin 


.-.  H/h  ='£jiln/h,  if  ////  is  not  inconsistent. 


If  tf/t  is  inconsistent,  i.e.  if  a/A  -^0  (for  /i  is  by  hypothesis  con 
sistent),  the  result  follows  at  once  by  (8). 

(2").i)   It  \tn\h     X  ,  tin?  above  may  be  written 

/,,./<!/,   =     "     r   . 

IX 
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(2G)   a/lt  =  (a+h)/k. 

For  (a  +  H)/h  =  a/h  +  l\h  -  aajh  by  (24), 

=  a/h  by  (13.1)  and  (8). 

(26.1)  This  may  be  written 

a/h  =  (h?a)/h. 

(27)  If  («  +  &)//i=0,  a/h=Q. 

a/k  +  [l/h-al/h]  =0,  by  (24)  and  hypothesis 

.-.  «///-  =  0  by  (v.). 

(27.1)  If  a/h=Q  and  Z///t=0,  (a  +  b)/h  =  Q.  This  follows 
from  (24). 

(28)  If  a/h  =  l,  (a  +  5)/A  =  l, 

(a  +  5)//t  -  a//6  +  5a//i  by  (24.3), 

whence  (a  +  b)/h  =  a/h  =  l  by  (l.l)  and  (8),  together  with  the 
hypothesis.  That  is  to  say,  a  certain  proposition  is  implied  by 
every  proposition. 

(28.1)  If  a/h  =  Q,  (d  +  b)/h  =  l  by  substituting  a  for  a  and  b 
for  B  in  (28).  That  is  to  say,  a  certainly  false  proposition 
implies  every  proposition. 

(29)  If  al(k1  +  h2)  =  l,  a/  1^  =  1, 

a/(^  +  7i2)=0, 

and  /.  «(//!  +  A2)/^-0  by  (15). 

Hence  a/V/'i  =  0  by  (27), 

whence  the  result. 

(29.1)  If  «//<!  =  !  and  «//A2  =  l,  a/(7t1  +  A2)  =  l. 
As  in  (20)         o/t1/(/A1  +  /42)  =  0  and  5//2/(//.1  +  //2)  -0. 

Hence  a(//1  +  //2)/(//1  +  A2)=0  by  (27.1), 

whence  the  result. 

(29.2)  If  «/(/*!  +  /*.2)  =  0,  a///1  =  0.      This  follows  from  (29). 

(29.3)  If  a/7^0  and  «///2  =  0,  a/(h1  +  J)2)=0.      This  follows 
from  (29.1). 

3.  Irrekvance  and  Independence  : 

(30)  If  a/h^L^a/Ji^  then  a/kji2  =  alklt  if  //^2  is  not  incon 
sistent. 

ali^ahh  +  aRli 


whence    «///,  =  a/kjiz,    unless    ^,=0,   i.e.    if   /?A  is   not   in 
consistent. 
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Thus,  if  a  proposition  is  irrelevant  to  an  argument,  then  the 
contradictory  of  the  proposition  is  also  irrelevant. 

(31)  Jf  a^aji  -a.,/h  and  a.Ji  is  not  inconsistent,  ajaji  ^ajh. 

This  follows  by  (iv.  c),  since  a.ja^i.ajh^ajaji.ajh  by  X. 
If,  that  is  to  say,  «j  is  irrelevant  to  the  argument  ajh  (see 
XIV.),  and  a2  is  not  inconsistent  with  h  :  then  «2  is  irrelevant 
to  the  argument  a^'h  ;  and  ajh  and  a.Jh  arc  independent 
(see  XIII.). 

4.    Theorems  <>f  Relevance  : 

(3-2)    If  c///^  >a/A,  h^ak  ^hjh. 
ah  is  consistent  since,  otherwise,  a!hhl  —  a/h  =  0. 
Therefore  a;7<  .  /^/a/?  =  a/M,  .  //1/A  by  X., 

>a..7i .  hjh  by  hypothesis  : 

so  that  hll'ah>hllh. 

Thus  if  /<!  is  favourably  relevant  to  the  argument  a/h,  a  is 
favourably  relevant  to  the  argument  hjh. 

This  constitutes  a  formal  demonstration  of  the  generally 
accepted  principle  that  if  a  hypothesis  h»-lps  to  explain  a 
phenomenon,  the  fact  of  the  phenomenon  supports  the  reality 
of  the  hypothesis. 

fn    the    following    theorems    p    will    be    said    to    be    more 

favourable  to  afh.  than  q  is  to  b/h,  if      ^    >    ^  ,  i.e.  if,  in  the 

a/h      b/h 

language  of  §  8  below,  the  coefficient  of  influence  of  p  on  a/h 
is  greater  than  the  coefficient  of  influence  of  </  on  b/h. 

(33)  If  x  is  favourable  to  a/h,  and  //,  is  not  less  favourable 
to  a/hx  than  x  is  to  a/hh^  then  /^  is  favourable  to  a/h. 

•,,         ,,,         ,,     a/hx    a/fth,x    a/hh, 

For  a/AAj  =  a/A  .    -       .    '     l    .    '      l    ;   and  by  hvpothesis  the 
a/h      a'hx     a/hJi^x 

second  term  on  the  right  is  greater  than  unity  and  the  pro 
duct  of  the  third  and  fourth  terms  is  greater  than  or  equal 
to  unity. 

(33.1)  A  fortiori,  ii  x  is  favourable,  to  a'h  and  not  favour 
able  to  a/hhlt  and  if  fi}  is  not  unfavourable  to  a/hx,  then  ht  is 
favourable  to  a/h. 

('•'>})  If  x  is  favourable  to  a/7/,  and  /*,  is  not  less  favourable 
to  x/ha  than  x  is  to  hl'ha,  then  hl  is  favourable  to  a/h. 

This  follows  by  the  same  reasoning  as  (33).  since  bv  an 
application  of  th«-  Multiplication  Theorem 
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a/hh^x     a/hh-L   _x\Jih^a     hjha 
a/hx  '  a/hh-Lx~  x/ha   '  hjhax 

(35)  If  x  is  favourable  to  a/h,  but  not  more  favourable  to  it 
than  li^x  is,  and  not  less  favourable  to  it  than  to  a/kh^  then 
\  is  favourable  to  a/h. 

|  a/h     a/hhjx}      \a/hx     a/hh^ 
For        a/Mi  =  a/A.  !          .  --  T,     }  •  ,     .-,-  .    ,,,     r- 
\a/hx      a/h          I  a/h     a/nhjX) 

This  result  is  a  little  more  substantial  than  the  two 
preceding.  By  judging  the  influence  of  x  and  li^c  on  the 
arguments  a/h  and  a/hhv  we  can  infer  the  influence  of  7^  by 
itself  on  the  argument  a/h. 

5.  The  Multiplication  Theorems  : 

(36)  If  «!/A   and  az/h   are  independont,   a^.Jh  =ajh  .  ajh. 

For  «'i«2//^  =  «i/«2^  •  azl^-  =  «2/ai^  •  wi//l  by  X>' 

and  since  aA///  and  ^2/A  are  independent, 

r^/rt2//  =aj/i  and  aja.lli  =  ajh  by  XIII. 

Therefore  «i«2/^  =  ^i/'^-  •  a^' 

Hence,  when  ajh  and  ajh  are  independent,  we  can  arrive  at  the 
probability  of  al  and  a2  jointly  on  the  same  hypothesis  by  simple 
multiplication  of  the  probabilities  a-Jh  and  ajh  taken  separately. 

(37)  If  pjh  =2>z/l)i/l-  =l1Jl)iVJl  =  '  ' 


For  jwzpz  .  .  -  lh=pjh  .  p2/p^  .  p^/p^        •  b7  repeated 
applications  of  X. 

6.  The  Inverse  Principle  : 


ajbh        Ijaju      ajh          vided   w        j     and   ah 
ajlk        Ifaji      ajti 


are 


each  consistent. 

For  ajlh  .  l>/h  -  h/itji  .  r/L///, 

and  «a/&//  •  &/>'' =  /V^'2/'  •  «2/^  b7  X- ' . 

whence    the    result    follows,    since    &///*0,    unless    /^A    is    in 
consistent. 

(38.1)     If    «-Jh=pv     «2I/I=P2>     l>/«ih  =  <lv     tyaji^qv     and 
abh+ctz/bh^l,  then  it  easily  follows  that 
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/V/2 


(3s.  2)    If  u  Jh     n^li  the  above  reduces  to 

7i 
'/i  +  72 

and  tijhh  -     <l-     , 

'/i  +  '/a 
sine*  it  ^  /i  3=0,  unless  -^A  is  inconsistent. 

Tlie  proposition  is  easily  extended  to  the  cases  in  which  the 
number  of  a's  is  greater  than  two. 

It  will  be  worth  while  to  translate  this  theorem  into  familiar 
language.  Let  b  represent  the  occurrence  of  an  event  B,  a^ 
and  az  the  hypotheses  of  the  existence  of  two  possible  causes 
A!  and  A2  of  B,  and  //  the  general  data  of  the  problem.  Then  pl 
and  -p2  are  the  d  priori  probabilities  of  the  existence  of  Al  and  A2 
respectively,  when  it  is  not  known  whether  or  not  the  event  B 
has  occurred  ;  ql  and  q.,  the  probabilities  that  each  of  tin-  causes 
Aj  and  A2,  if  it  exists,  will  be  followed  by  the  event  B.  Then 

and  are  the  probabilities  of  the  existence 


of  Aj  and  A2  respectively  after  the  event,  i.e.  when,  in  addition 
to  our  other  data,  we  know7  that  the  event  B  has  occurred.  The 
initial  condition,  that  bh  must  not  be  inconsistent,  simply  ensures 
that  the  problem  is  a  possible  one,  i.e..  that  the  occurrence  of  the 
event  B  is  on  the  initial  data  at  least  possible. 

The  reason  why  this  theorem  has  generally  been  known  as 
the  Inverse  Principle  of  Probability  is  obvious.  The  causal 
problems  to  which  the  Calculus  of  Probability  has  been  applied 
are  naturally  divided  into  two  classes  the  direct  in  which,  given 
the  cause,  we  deduct;  the  effect  :  the  indirect  or  inverse  in  which. 
given  the  effect,  we  investigate  the  cause.  The.  Inverse  Principle 
has  been  usually  employed  to  deal  with  the  hitter  class  of 
problem. 

7.   Theorems  <ni  (he  Combination  of  /Vf'w/.w.s-  : 

The  Multiplication  Theorems  given  above  deal  with  the  com 

bination  of  conclusions  :   given  n/h^  and  <t/h2  we  considered  the 

relation  o!  "^ijli  to  these  probabilities.     In   this  paragraph  the. 

corresponding   problem  of  the  combination  of  premisses  will  be 
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treated  ;  given  a/h±  and  a/h2  we  shall  consider  the  relation  of 
a/h-,h2  to  these  probabilities. 


/ by x- and 

ah  Jijh  +  ah  r/-  2/h 


U  +  V 

where  n  is  the  d  priori  probability  of  the  conclusion  a  and  both 
hypotheses  h-^  and  h2  jointly,  and  v  is  the  d  priori  probability 
of  the  contradictory  of  the  conclusion  and  both  hypotheses  h± 
and  h2  jointly. 


,     }       , 

(''    *  2      a*  +  di--t      >/2  .  q 


h2/ahi .  p  + 
where  p-^ajh^  and  q  =  a/hz. 

(40.1)   If   p  =  %,    a.jhji2  = 

and  increases  with 

// 

These  results  are  not  very  valuable  and  show  the  need  of  an 
original  method  of  reduction.  This  is  supplied  by  Mr.  W.  E. 
Johnson's  Cumulative  Formula,  which  is  at  present  unpublished 
but  which  I  have  his  permission  to  print  below.1 

8.  It  is  first  of  all  necessary  to  introduce  a  new  symbol.  Let 
us  write 

XV.  a/bh,  =  {ahb}a/h  Def. 
We   may  call  {«''&}  the  coefficient  of  influence  of  b  upon  a   on 
hypothesis  h. 

XVI.  {a*6}-{«6*cj  =  -fa*6*c}  Def. 
and  similarly               {</''/,}  -  {abhcdhe}  =  {a*bhr.d]lf}. 

These  coefficients  thus  belong  bv  definition  to  a  general  class  of 
operators,  which  we  may  call  separative  factors. 

(41)  ab/h  =  {'}'b}  .a fit  .  b/h., 

since  ab/h  =  ajbh  .  b/h. 

1  The  substance  of  propositions  (41)  to  (40)  below  is  derived  in  its  entirety 
from  his  notes,— the  exposition  only  is  mine. 
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Thus  we  may  also  call  {«''&}  the  coefficient  of  dependence  between 
a  and  6  on  hypothesis  h. 

(4M)  uln-;/!  =  {V7/V}  .  /////  .  A/A  .  c/li. 

For  «Lr'h  =  \ah'Aan//t  .  <•//,  by  (41), 

=  {alhr}  .  {"''A}  .  ,!//,  .  A/A  .  c//t  by  (41). 
(4  1.2)  And  in  general 

nh't1  .  .  .  !h  -  {^AW1  .  .  .  }  .  /////  .  A///  .  <•///  .  /////  .  .  . 

(42)  K/,}  =  {7,*,,}, 

since  (ifJJi  .  A/ A  =  l>/a/i  .  ajh. 

since  «'h  .  A'//  .ell,  =  <ilh  .'cjh  .  A/A. 

(42.2)  And  in  general  we  have  a  commutative  rule,  by  which 
the  order  of  the  terms  may  be  alwavs  commuted 

(43)  As  a  multiplier  the  separative  factor  operates  so  as  to 
separate  the  terms  that  may  be  associated  (or  joined)  in  the 
multiplicand. 

Thus  {,///'<•</",-}  .{„*/>}  =  {**/,    ,/V}, 

=  h/AV,/V}  .  {r/'Aj  .  a/h  .  A/A  .  <•<////  .  «-///, 
and  also       alwhjh  =  {at'hhrdhe}  .  ai/i  .  A/A  .  a//A  .  e./li. 
8iinil;»r]y  (for  example) 

{alwWtf}  .  {atfr}  .  {,,''1,}  ={a"!>hchdhef}. 

(44)  {,.''!,}.  {„!,}  --{'i1'!'}. 

For  nl,iji  =  {//AJ//A7/. 

Bv  a  symbolic  convention,  therefore,  we  may  put  \ob}  —1- 

(44.1)   Tf    fo/'6}=l,    it    follows    that    a/h    and    A/A    are    in 
dependent  arguments  ;    and  conversely. 

(45)  Huh-  of  Repetition  {,>.,'"!,}     {>A}. 

For  by  (vi.)  and  (12). 

(4iJ)  The  Cnmuldtive  Formula: 
x/ti/i  \  rj'' lull  :  ,/•"  7r A  : 

/•'A  .  ///./•//  : ,/  '7/  .  ///./•'//  :  ./•"//,  .  ///./•"//  :  .  .  .    by  (3S). 

Take  //  +  1  propositions  //,  A,  r  .  .  .        Then  by  repetition 

x/«/i  .  ./-/A//  .  ./-VA  .  .  .  :  .r'la.x'hf  .  x'/r  .  .  .:  .>•"!<>  .  ./•"//'  -  •'•"/''  ...:.-. 
=  (/•///)"    '///.r//  .  A/,/7,  .  .  .  :  (^'//O"  '  '"A'"'7'  •  ''A'1'/'  •  •  • 

which  mav  be  written 
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71+1  77  +  1  77+1 

Ux/ah  :  Ux'/ah :  Ux"/ah  :  .  .  . 

II  +  1  72  +  1 

=  (x/h)n+lTLa/xh  :  (xf/h)n+lUa/xfh  :  .  .  . 
Now 
x/halc  .  .  .  :  x'/halc  .  .  .  :  x"/halc  .  .  . 

=  x/h  .  (ale  .  .  .)  It'll :  x'lli  .  ('tic.  .  .  .)  jx'h  :  .  .  .   by  (38), 
and 

abc  .  .  .  /xh  =  {axh1f"c  .  .  .  }Ua/xh          by  (41.2), 
.-.  (x/h)n.;.;/habc  .  .  . :  (xr  /  h)n .  x' /  Jiabc  .  .  .  :  (xff  /  h)n .  x"  /  hale  ...:... 

=  {a'Vc  .  .  .  }x/aJt  .  se/M  .  x/r,h  .  .  .  :  {^7i&*\  .  .  .  }x/ah  .  u'/bk 

,  x'/ch  ...:... 
which  may  be  written 

(x/h)n  .  x/hale  .  .  .  ac{achlrjtc  .  .  .  }  .  x/ah  .  x/l/t.  .  xjc.k  .  .  . 
where  variations  of  x  are  involved. 

The  cumulative  formula  is  to  be  applied  when,  having  accumu 
lated  the  evidence  a,  b,  c  .  .  .,  we  desire  to  know  the  comparative 
probabilities  of  the  various  possible  inferences  x,  x'  .  .  .  which 
may  be  drawn,  and  already  know  determinately  the  force  of 
each  of  the  items  a,  b,  c  .  .  .  separately  as  evidence  for  x,  x'.  .  .  . 

Besides  the  factors  x/ah,  x/lh,  etc.,  we  require  to  know  two 
other  sets  of  values,  viz.  :  (1)  x/h,  etc.,  i.e.  the  d  priori 
probabilities  of  aj,  etc.,  and  (2)  {axhlchc  .  .  . },  etc..  i.e.  the 
coefficients  of  dependence  between  a,  I,  and  c  ...  on  hypotheses 
xh,  etc.  It  may  be  remarked  that  the  values  {a3hlxhc  .  .  .}, 
{aafhVfhc  ...}...  are  not  in  any  way  related,  even  when  x'  =x. 

What  corresponds  to  the  cumulative  formula  has  been  em 
ployed,  sometimes,  by  mathematicians  in  a  simplified  form 
which  is,  except  under  special  conditions,  incorrect.  First,  it- 
has  been  tacitly  assumed  that  [a J'hlJ 'V  .  .  .  },  {V'7/7V  ...}... 
are  all  unity  :  so  that 

(x/h)"x//ialc  .  .  .  oc  x/ah  .  x/lh  .  ,r/r//  .  .  . 
Secondly,  the  factor  (x/h)n  has  been  omitted,  so  that 
x//mlr  .  .  .  oc  s/ah  .  x/l/i  .  x/cJi  .  .  . 

It  is  this  second  incorrect  statement  of  the  formula  which 
leads  to  the  fallacious  rule  for  the  combination  of  the  testimonies 
of  independent  witnesses  ordinarily  given  in  the  text-books.1 

(46.1).   If  ale  .  .  .  IX/L  =  {«*''/ A- '  .  .  }  a/xh  .  l/xh  .  c/xh  .  .  . 
then          x/holc  .  .  .  cc{V';7/'V  .  .  .  }  ./.-/<///  .  x/lh  .  x/rh  .  .  . 

1  See  p    180  below. 
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This  result  is  exceedingly  interesting.  Mr.  Johnson  is  the  first  to 
arrive  at  the  simple  relation,  expressed  above,  between  the  direct 
and  the  inverse  formulie  :  viz.  that  the  same  coefficient,  is  re 
quired  for  correcting  the  simple  formula}  of  multiplication  in 
both  cases.  As  lie  remarks,  however,  while  the  direct  formula 
gives  the  required  probability  directly  by  multiplication,  the 
inverse  formula  gives  only  the  comparative  probability. 

(-16. 2)   If   x,  x',  A"  .  .  .  are   exclusive   and   exhaustive   alterna 
tives, 


"  .  (rt'VV.  ..}\\.r'/a/>] 

since  xfh'Lln-  .  .  .  oc  (./://,)    B{a'*6xV  .  .  .  }  I  \xf  ///, 

and  ^x'jhale .  .  .  -1  by  (21.7). 

./•/// ,//„•  .  .  .      r////  .If  I,  .t'l/i  .  .  .  ale  .  .  ./x/i 

,/•///  ale.  ../A         '  «-/^7/  .A/,/'//  .c/x/i  .  .  . 

It     j-jll' 

JL'/h      '      ,*•///•     ' 

For  "/"'  .  .  .  .''•///  —xjh,  .  alt' .  .  .  /•''//•, 

alt:  .    .  .  H'lk         "Le  .  .  .  jj'it      <>//<  .  /;///  .  /•///  .  .  . 
•'•  ale.  .  .///  ..-£///       '//"•  ...///   "          «//»•  .../// 

///^-  .  .  ././://  I  r//,/://      ///./'// 

'  r//.y//  .  Ifxh  .  '•/./'//  ..."   [    "///    '    A/// 

a/.i'Ji      ,>'((/! 
whence  the  result,  since  ,    ,  etc. 


(47.1)  The  above  formula  may  be  written  in  the  condensed 
form 


{3l/.}»x/l,,il.c...      {,/'V/V'\..}     xlnh.xIM 

{xlli}  xjhalc...      {ti"'l,'kr'k...}  '  x/al,  .xj/>/<  .xj,-/,....' 

This  follows  at  once  from  (40.2),  since  x  and  x  are  exclusive  and 
exhaustive  alternatives.  (It  is  assumed  that  r/i,  Jrh.  and  ah, 
etc.,  are  not  inconsistent.) 


154  A  TREATISE  ON  PROBABILITY  PT.  n 

This  formula  gives  x/habc  ...  in  terms  of  x/ah,  x/bh,  etc., 
together  with  the  three  values  x/h,  {VV/  V'1  .  .  .  },  and 
{a*hbshc*h...}. 

(4-fti}       x/habcd  .  .  .     x/hbcd  .  .  .     {axjlbcd  ...}.  x/ah     x/h 
x/habr-d  .  .  .'  x/hbcd~.  =  {a*hlwd  .  .  .  }  .  x/ah  ''  x/h 
This  gives  the  effect  on  the  odds  (prob.  x  :  prob.  x)  of  the  extra 
knowledge  a. 

(49)  When  several  data  co-operate  as  evidence  in  favour  of  a 
proposition,  they  continually  strengthen  their  own  mutual 
probabilities,  on  the  assumption  that  when  the  proposition 
is  known  to  be  true  or  to  be  false  the  data  jointly  are  not 
counterdependent. 

I.e.  if  {ax}l¥hc  .  .  .  }  and  {a*hb£hc  .  .  .}  are  not  less  than 
unity,  and  x/kh>x/h  where  k  is  any  of  the  data  a,  b,  c  .  .  .,  then 
[ahbhchd  .  .  .}  beginning  with  unity,  continually  increases,  as 
the  number  of  its  terms  is  increased. 


abc  .  .  .  /h  =  xabc  .  .  .  /'//•  +  xabc  .  .  .///  by  (24.2). 

=  x/h  .  abc  .  .  .  /xh  +  x//i  .  abc  .  .  .  /xh. 
>xjh  .  Ua/xh  .  b/xli,  .  .  .  +x/hUalxk  .  l/xlt,  .  .  . 
(since  {a'hlfhc  .  .  .}  and  {anl*hc.  .  .  .}  are  not  less  than  unity), 


_  x/h 

abc  .  .,  .  Ill 


[~x/ah    xfb}, 

I     X/h     '    Xlk 


It 

We  can  show  that  each  additional  piece  of  evidence  a,  6,  c  ... 
increases  the  value  of  this  expression.  For  let  x/h  .  G+x/h  .  G'  be 
its  value  when  all  the  evidence  up  to  k  exclusive  is  taken,  so  that 

x/kh.G+x/kh.Gr' 

is  its  value  when  k  is  taken.  Now  G>G'  since  x/ah>x/h,  etc., 
and  x/ah<x/h,  etc.,  by  the  hypothesis  that  the  evidence 
favours  x ;  and  for  the  same  reason  x/kh  -  x/h,  which  is  equal 
to  x/h  -x/kh,  is  positive. 

.-.  G  (a//,-//  -xl/t)>G'(x//i,-x/k/i), 
i.e.  x/ Hi.  .  G+x/kh  .  G'xr/h  .  G  +x/h  .  G', 

whence  the  result. 
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(•10.1)    The   above    proposition   can    be    generalised   for  the 
case  of  exclusive  alternatives  x,  j:'  ',  ./•"  .  .  .  (in  place  of  y,  x). 

For   {«''//••'•  .  .  .   \ 

=  ,-//,.  <„"'//'<,...   \{a\r}    (b^}{c\c}... 
+  x'/h.  X^'V...  \  |,iV)    {/'V  !('V}  ... 


from  which  it  follows  that,  if  (a'W.  .  .  }etc.  <1,  and  if 
{„'•.,.;.  _1,  |  /,\,  :•-!),  I/.*:  ]  ;,  etc.,  have  the  same  sign,  then 
{</''//>  .  .  .  !  is  increasing  (with  the  number  of  letters)  from  unity. 
Mr.  Johnson  describes  this  result  as  a  generalisation  of 
the  corrected  '•  middle  term  fallacy  "  (see  Chap.  V.  §  4). 


APPENDIX 

ON    SYMBOLIC   TREATMENTS   OF    PROBABILITY 

TUP:  use  of  the  symbol  0  for  impossibility  and  1  for  certainty  was 
first  introduced  by  Leibnitz  in  a  very  early  pamphlet,  entitled 
Spcciim'.H  certitudinis  sea  demon strationwn  in  jure,  exhibition  in 
doctrlna  conditional^,  published  in  1605  (vide  Couturat,  Logique  de 
Leibnitz,  p.  553).  Leibnitx  represented  intermediate  degrees  of 
probability  by  the  sign  i,  meaning,  however,  by  this  symbol  a 
variable  between  0  and  1 . 

Several  modern  writers  have  made  some  attempt  at  a  symbolic 
treatment  of  Probability.  But  with  the  exception  of  Boole,  whose 
methods  i  have  discussed  in  detail  in  Chapters  XV.,  XVI.,  and 
XVII.,  no  one  has  worked  out  anything  very  elaborate. 

Mr.  MeColI  published  a  number  of  brief  notes  on  Probability  of 
considerable  interest — see  especially  his  Symbolic  Logic,  Sixth  Paper 
on  the  (Wculus  of  l^/xiraleat  Statements,  and  On  the  Growth  and  Use 
of  a  Symbolical  iM/u/uut/e.  The  conception  of  probability  as  a  relation 
between  propositions  underlies  his  symbolism,  as  it  does  mine.1  The 

probability  of  «,  relative  to  the  a  priori  premiss  //,  he  writes  ;  and 
the  probability,  given  b  in  addition  to  the  a  priori  premiss,  he  writes 

Thus  a  alh  and''  a:bh.  The  difference  ,  ,?>.  the  change 
b'  t  b  b  , 

in  the  probability  of  a  brought,  about  by  the  addition  of  b  to  the 
evidence,  he  calls  '  the  dependence  of  the  statement  a  upon  the  state- 

1  1  did  not  conic  .-KTOHH  these  notes  until  my  own  method  was  considerably 
develop!.  Mr.  M<-Coll  han  »>eon  the  first  to  use  tho  fundamental  Hvinbul  of 
Probability. 
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ment  &,'  and  denotes  it  by  8         Thus  (%  -0,  where,  in  my  termm- 

o  b 

ology,  b  is  irrelevant  to  a  on  evidence  h.     The  multiplication  and 


addition  formulae  he     ives  as  follows  : 


=     .  a 

€        e      a       e       b 


a  +  b     a     b     ab 


Also  <  =         ,  where  A  =  a. 

o      r>    a  e 

It  is  surprising  how  little  use  he  succeeds  in  making  of  these  good 
results.     He  arrives,  however,  at  the  inverse  formula  in  the  shape  — 


where  CL  .  .  .  cn  are  a  series  of  mutually  exclusive  causes  of  the  event 
v  and  include  all  possible  causes  of  it  ;  reaching  it  as  a  generalisation 
of  the  proposition 

a     b 

a  €     a 

b     a     b     a     b 

+ 
6      a       e      n- 

In  a  paper  entitled  "  Operations  in  Relative  Number  with  Appli 
cations  to  the  Theory  of  Probabilities,"  1  Mr.  B.  I.  Gilman  attempted 
a  symbolic  treatment  based  on  a  frequency  theory  similar  to  Venn's. 
but  made  more  precise  and  more  consistent  with  itself  :  "  Probability 
has  to  do,  not  with  individual  events,  but  with  classes  of  events  ;  and 
not  with  one  class,  but  with  a  pair  of  classes.  —  the  one  containing, 
the  other  contained.  The  latter  being  the  one  with  which  we  are 
principally  concerned,  we  speak,  by  an  ellipsis,  of  its  probability 
without  mentioning  the  containing  class  ;  but  in  reality  probability 
is  a  ratio,  and  to  define  it  we  must  have  both  correlates  given."  But 
Mr.  Oilman's  symbolic  treatment  leads  to  very  little.  More  recently 
R.  Laemmel.  in  his  Untersuchungen  uber  die  Ermittlung  von  Wahr- 
scheinlichkeiten,  made  a  beginning  on  somewhat  similar  lines  ;  but 
in  his  case  also  the  symbolic  treatment  leads  to  no  substantial  results. 

Apart  from  the  writers  mentioned  above,  there  are  a  few  who 
have  incidentally  made  use  of  a  probability  symbol.  It  will  be 
sufficient  to  cite  Czuber.2  He  denotes  the  probability  of  an  event 

1   Published  in  the  volume  of  Johns  Hopkins  Studies  in  Lome. 
2   Wahrscheinlichkeitsrechnung,  vol.  i.  pp.  43-48. 
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E  by  W(E),  and  the  probability  of  the  event  E  given  the  occurrence  of 
an  event  V  by  \V,(E).  He  uses  this  symbol  to  give  W,(E)  =  Wp(K) 
as  the  criterion  of  the  independence  of  the  events  E  and  F  (F  denoting 
the  non-occurrence  of  V)  ;  \V,(E)--  1,  as  the  expression  of  the  fact 
that  E  is  a  necessary  consequent  of  J<"  ;  and  one  or  two  other  similar 
results. 

Finally  there  is  in  the  Bulletin  of  the  Physico-matliematical  Society 
of  Kazan  for  18^7  a  memoir  in  Russian  by  Platou  S.  Port  tzki  entitled 
"  A  Solution  of  tlie  (Jeneral  Problem  of  the  Theory  of  Probability  by 
Means  of  Mathematical  Logic."  I  have  seen  it  stated  that  Schroder 
intended  to  publish  ultimately  a  symbolic  treatment  of  Probability. 
Whether  he  had  prepared  any  manuscript  on  the  subject  before  his 
deatli  I  do  not  know. 


CHAPTER  XV 

NUMERICAL    MEASUREMENT   AND    APPROXIMATION    OF 
PROBABILITIES 

1.  THE  possibility  of  numerical  measurement,  mentioned  at 
the  close  of  Chapter  III.,  arises  out  of  the  Addition  Theorem 
(24.1).  In  introducing  the  definitions  and  the  axiom,  which  are 
required  in  order  to  make  tlie  convention  of  numerical  measure 
ment  operative,  we  may  appear,  as  in  the  case  of  the  original 
definitions  of  Addition  and  Multiplication,  to  be  arguing  in  an 
artificial  way.  This  appearance  is  due,  here  as  in  Chapter  XII., 
to  our  having  given  the  names  of  addition  and  multiplication  to 
certain  processes  of  compounding  probabilities  in  advance  of 
postulating  that  the  processes  in  question  have  the  properties 
commonly  associated  with  these  names.  As  common  sense  is 
hasty  to  impute  the  properties  as  soon  as  it  hears  the  names,  it 
may  overlook  the  necessity  of  formally  introducing  them. 

2.  The  definitions  and  the  axiom  which  are  needed  in  order 
to  give  a  meaning  to  numerical  measurement  are  the  following  : — 

XVII.  a/k  +  {a/h  +  [a/h  +  (a/h  +  ...r  terms)]}  =  r  .  a/h.     Def . 

XVIII.  If  r  .  ajli  =  l/f,  then  a/h,  = l  .  b!f.  Def. 

T 

XIX.  If    b/f  =  q.c/g,    then  1.&//=?c/^.  Def. 

Thus  if  b/h-=a/h  +  a/h+  ...  to  r  terms,  then  the  probability 
b/h  is  said  to  be  r  times  the  probability  a/h  ;  hence  if  ab/h  =0  and 
a/h  =  b/h,  the  probability  (a+b)/h  is  twice  the  probability  a/h. 
If  a  and  b  are  exhaustive  as  well  as  exclusive  alternatives  re 
latively  to  h.  so  that  (a+b)/h  =  lt  since  we  take  the  relation  of 
certainty  as  our  unit,  then  a/h  =  b/h  =  J. 

We  also  need  the  following  axiom  postulating  the  existence  of 
relations  of  probability  corresponding  to  all  proper  fractions  : 

158 
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(vii.)  If  q  and  r  are  any  finite  integers  and  </<r,  there,  exists 
a  relation  of  probability  which  can  be  expressed,  by  means  of  the 

convention  of  the  foregoing  definitions,  as    • 

T 

3.  From  these  axioms  and  definitions  combined  with  those 
of  Chapter  XII.,  it  is  easy  to  show  (certainty  being  represented 
by  unity  and  impossibility  by  zero)  that  we  can  manipulate 
according  to  the  ordinary  laws  of  arithmetic  the  "  numbers  " 
which  by  means  of  a  special  convention  we  have  thus  introduced 
to  represent  probabilities.  Of  the  kind  of  proofs  necessary 
for  the  complete  demonstration  of  this  the  following  is  given  as 
an  example  : 

(50)    If  a//=  l  and  l.'Ji  =  \  «//+''//'  = 

in  n  in  n 

Let  the  probability        =P,  which  exists  by  (vii.), 

///  // 

then  />.P  =      -a/f  by  (XIX.), 

and  w.P=      =/*//', 

•    ,{.•'/*  +  I,  pi  =  n  .  P  +  //'  .  P,  if  this  probability  exists, 

P  +  P  .  .  .  to  //  terms  +  P  +  P  .  .  .  to  m  terms, 
^  P  +  P  .  .  .  to  in  +  n  terms, 


///  n 


by  (XIX. 


This  probability  exists  in  virtue  of  (vii.). 

4.  Many  probabilities—  in  fact  all  those  which  are  equal  to 
the  probability  of  some  other  argument  which  has  the  same 
premiss  and  of  which  the  conclusion  is  incompatible  with  that 
of  the  original  argument  —  are  numerically  measurable  in  the 
sense  that  there  is  some  other  probability  with  which  they  are 
comparable  in  the  manner  described  above.  But  they  are  not 
numerically  measurable  in  the  most  usual  sense,  unless  the  pro 
bability  with  which  they  are  thus  comparable  is  the  relation 
of  certainty.  The  conditions  under  which  a  probability  a/h  is 

nurncricallv    measurable    and    equal    to        are    easily    seen.      It 
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is  necessary  that  there  should  exist  probabilities  ajh^  a2/h2 .... 

aj/i.j .  .  .  (i, Ih,.,  such  that 

Ct-J/l",  =  (f^h^  "  '  '  *  ~  ^qfon  =  •  •  •  ~  ttrfor) 

q  r 

1  1 

If  a/h  =  qi  and  6/A-?2,  it  follows  from  (32)  that  ab/h^^2 

only  if  ajh  and  b/h  are  independent  arguments.  Unless,  there 
fore,  we  are  dealing  with  independent  arguments,  we  cannot 
apply  detailed  mathematical  reasoning  even  when  the  individual 
probabilities  are  numerically  measurable.  The  greater  part  of 
mathematical  probability,  therefore,  is  concerned  with  arguments 
which  are  both  independent  and  numerically  measurable. 

5.  It  is  evident  that  the  cases  in  which  exact  numerical 
measurement  is  possible  are  a  very  limited  class,  generally 
dependent  on  evidence  which  warrants  a  judgment  of  equi- 
probability  by  an  application  of  the  Principle  of  Indifference. 
The  fuller  the  evidence  upon  which  we  rely,  the  less  likely  is  it  to 
be  perfectly  symmetrical  in  its  bearing  on  the  various  alternatives, 
and  the  more  likely  is  it  to  contain  some  piece  of  relevant  informa 
tion  favouring  one  of  them.  In  actual  reasoning,  therefore, 
perfectly  equal  probabilities,  and  hence  exact  numerical  measures, 
will  occur  comparatively  seldom. 

The  sphere  of  inexact  numerical  comparison  is  not,  however, 
quite  so  limited.  Many  probabilities,  which  are  incapable  of 
numerical  measurement,  can  be  placed  nevertheless  between 
numerical  limits.  And  by  taking  particular  non-numerical 
probabilities  as  standards  a  great  number  of  comparisons  or 
approximate  measurements  become  possible.  If  we  can  place 
a  probability  in  an  order  of  magnitude  with  some  standard  prob 
ability,  we  can  obtain  its  approximate  measure  by  comparison. 

This  method  is  frequently  adopted  in  common  discourse. 
When  we  ask  how  probable  something  is,  we  often  put  our  ques 
tion  in  the  form — Is  it  more  or  less  probable  than  so  and  so  ? — 
where  '  so  and  so  '  is  some  comparable  and  better  known  prob 
ability.  WTe  may  thus  obtain  information  in  cases  where  it  would 
be  impossible  to  ascribe  any  number  to  the  probability  in  question. 
Darwin  was  giving  a  numerical  limit  to  a  non-numerical  prob- 
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ability  when  he  said  of  a  conversation  with  Lyell  that  he  thought 
it  no  more  likely  that  he  should  be  right  in  nearly  all  points  than 
that  he  should  toss  up  a  penny  and  get  heads  twenty  times 
running.1  Similar  cases  and  others  also,  where  the  probability 
which  is  taken  as  the  standard  of  comparison  is  itself  non- 
numerical  and  not,  as  in  Darwin's  instance,  a  numerical  one, 
will  readily  occur  to  the  reader. 

A  specially  important  case  of  approximate  comparison  is  that 
of  '  practical  certainty.'  This  differs  from  logical  certainty  since 
its  contradictory  is  not  impossible,  but  we  are  in  practice  com 
pletely  satisfied  with  any  probability  which  approaches  such 
a  limit.  The  phrase  has  naturally  not  been  used  with  complete 
precision  ;  but  in  its  most  useful  sense  it  is  essentially  non- 
numerical— we  cannot  measure  practical  certainty  in  terms  of 
logical  certainty.  We  can  only  explain  how  great  practical 
certainty  is  by  giving  instances.  We  may  say,  for  instance,  that 
it  is  measured  by  the  probability  of  the  sun's  rising  to-morrow. 
The  type  which  we  shall  be  most  likely  to  take  will  be  that  of  a 
well- verified  induction. 

6.  Most  of  such  comparisons  must  be  based  on  the  principles 
of  Chapter  V.  It  is  possible,  however,  to  develop  a  systematic 
method  of  approximation  which  may  be  occasionally  useful. 
The  theorems  given  below  are  chiefly  suggested  by  some  work 
of  Boole's.  His  theorems  were  introduced  for  a  different  pur 
pose,  and  he  does  not  seem  to  have  realised  this  interesting 
application  of  them  ;  but  analytically  his  problem  is  identical 
with  that  of  approximation.2  This  method  of  approximation 
is  also  substantially  the  same  analytically  as  that  dealt  with  by 
Mr.  Yule  under  the  heading  of  (.'onsi.stenre.3 

1  Life  and  Lcttf-™,  vol.  ii.  p.  240. 

2  In  Boole's  Cdlculux  we  are  apt  to  be  left  with  an  equation  of  the  second 
or  of  an  even  higher  decree  from  whieh  to  derive  the  probability  of  the  coiiclu- 
sion  ;    and  Boole  introduced  these  methods  in  order  to  determine  which  of  the 
several  roots  of  his  equation  should  be  taken  as  giving  the  true  solution  of  the 
problem  in  probability.      In  each  case  he  shows  that  that  root  must  be  chosen 
which  lies  between  certain  limits,  and  that  only  one-  root  satisfies  this  condition. 
The  general  theory  to  be  applied  in  such  cases  is  expounded  by  him  in  Chapter 
XIX.  of  The.  Line*  of  Thought,  which  is  entitled  "  On  Statistical  Conditions." 
But  the  solution  yiven  in  that  chapter  is  awkward  and  unsatisfactory,  and  he 
subsequently   published  a  much  letter  method  in  the  I'hilo*nphical  Magnzine 
for  1854  (4th  series,  vol.  viii.)  under  the  title  "On  the  Conditions  by  which  the 
Solutions  of  Questions  in  the  Theory  of  Probabilities  are  limited." 

1   Theory  of  8Uili*ticJi,  chap.  ii. 

M 
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(51)  xy/h  always   lies   between1   x/h  and  x/h+y/h-l   and 
between  y/h  and  x/h+y/h  - 1. 

For  xy/h  =  x/h  -  xy/h  by  (24.2), 

-=x/h-y/h.x/yh  by  X. 

Now  sc/£&  lies  between  0  and  1  by       (2)  and  (3), 

.-.  xy/h  lies  between  x/h  and  x/h  -  y/h, 
i.e.  between  x/h  and  .^/h+y/h  -I. 

As  xy/h^O,  the  above  limits  may  be  replaced  by  x/h  and  0,  if 
x/h+y/h-l<0. 

We  thus  have  limits  for  xy/h,  close  enough  sometimes  to  be 
useful,  which  are  available  whether  or  not  x/h  and  y/h  are  inde 
pendent  arguments.  For  instance,  if  y/h  is  nearly  certain,  xy/h 
=  x/h  nearly,  quite  independently  of  whether  or  not  x  and  y  are 
independent.  This  is  obvious  ;  but  it  is  useful  to  have  a  simple 
and  general  formula  for  all  such  cases. 

(52)  x^x2 .  .  .  xn^Jh  is  always  greater  than  ^  xr/h-n. 

For  by  (51)  x^  .  .  .  xn+l/h  >x^2  . . .  xn/h+xn+,/h  - 1 

>x1x2  . . .  xn  _  Jh  +  xjh  +  xrt+Jh  -  2. 
and  so  on. 

(53)  xij/Ji  +  xy/h  is  always  less  than  x/h-y/h  +  1,  and  less 
than  yjlt  -  x/h  + 1 . 

For  as  in  (51)  xy\h  =  x/h  -  xy/h 

and  xy/h  =  y/h-xy/h, 

:.  xy/h  +  xy/h  =  x/h  -  ij/h  +  1  -  2xy/h, 
whence  the  required  result. 

(54)  xyjh  -  xy/h  -  x/h  +  y/h  - 1 . 

This  proposition,  which  follows  immediately  from  the  above, 
is  really  out  of  place  here.  But  its  close  connection  with  con 
clusions  (51)  and  (53)  is  obvious.  It  is  slightly  unexpected, 
perhaps,  that  the  difference  of  the  probabilities  that  both  01  two 
events  will  occur  and  that  neither  of  them  will,  is  independent  of 
whether  or  not  the  events  themselves  are  independent. 

7.  It  is  not  worth  while  to  work  out  more  of  these  results  here. 
Some  less  systematic  approximations  of  the  same  kind  are  given 
in  the  course  of  the  solutions  in  Chapter  XVII. 

In  seeking  to  compare  the  degree  of  one  probability  with  that 
of  another  we  may  desire  to  get  rid  of  one  of  the  terms,  on  account 

1  In  this  and  the  following  theorem:*  the  term  '  between  '  includes  the 
limits. 
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of  its  not  being  comparable  with  any  of  our  standard  probabilities. 
Thus  our  object  in  general  is  to  eliminate  a  given  symbol  of 
quantity  from  a  set  of  equations  or  inequations.  If,  for  instance, 
we  are  to  obtain  numerical  limits  within  which  our  probability 
must  lie,  we  must  eliminate  from  the  result  those  probabilities 
which  are  non-numerical.  This  is  the  general  problem  for 
solution. 

(55)  A  general  method  of  solving  these  problems  when  we 
can  throw  our  equations  into  a  linear  shape  so  far  as  all  symbols 
of  probability  are  concerned,  is  best  shown  in  the  following 
example  :  — 

Suppose  we  have  \+  v  =  a  (i.) 

\  +  a-  =  1i  (ii.) 

\+  v  +o-  =  i'  (in.) 

\+/JL  f  \>  +  p     d  (iv.) 

\+  /jL  +  a  +r  =  <r  (v.) 

\+/JL+V+f)+(T+T  +V  =  I  (vi.) 

where  X,  //,  r,  p.  a,  T,  v  represent  probabilities  which  are  to  be 
eliminated,  and  limits  are  to  be  found  for  c  in  terms  of  the 
standard  probabilities  a,  I,  d,  r,  and  1. 

X,  fj.,  etc.,  must  all  lie  between  0  and  1. 

From  (i.)  and  (iii.)  a  =c-a  ;  from  (ii.)  and  (iii.)  v  =c  -b. 

From  (i.),  (ii.),  and  (iii.)  \---a+b-c. 
whence  >'-<'>$,   r  -//.,:().    n+b-ry.Q, 

substituting  for  a.  /-,  X  in  (iv.),  (v.),  and  (vi.) 

p.  +  p        <l       >l,     /Z  +  T     ->'-/>,     fM  +  p    r  T  +  V  •-=  1   -  f, 

whence  p  -  d  -  a  -  /i.   r  =  e-1>-p,,   v  =  l-c-d  +  a-c  +  b  +  fj., 


We  have  still  to  eliminate  /z.      p,~*<l  -n,  /j,~-:*  -l>, 

^•f  \-d\-c     a  -  l>  -  1, 

.-.    ,/  _„    ;r   i.ft  +  e-fi-lt-}    and  f        ,    -.,.  +d  +  <,  _„_/,_] 

Hence  we  have  : 

Upper  limits  of  r;  —  //  .  1  _,v/  i  [  -(/ta-\  />  (whichever  is  least); 

Lower  limits  of  r  :—  rr,  //  (whichever  is  greatest). 

This  example,  which  is  only  slightly  modified  from  one  given 
by  Boole,  represents  the  actual  conditions  of  a  well-known 
problem  in  probability. 


CHAPTER    XVI 

OBSERVATIONS   ON   THE    THEOREMS    OF   CHAPTER    XIV.   AND   THEIR 
DEVELOPMENTS,  INCLUDING  TESTIMONY 

1.  IN  Definition  XIII.  of  Chapter  XII.  a  meaning  was  given  to 
the  statement  that  a-Jh  and  a2/h  are  independent  arguments. 
In  Theorem  (33)  of  Chapter  XIV.  it  was  shown  that,  if  a-Jh  and 
a^/h  are  independent,  a1a2/h=a1/h  .  a2/h.  Thus  where  on  given 
evidence  there  is  independence  between  ax  and  «2,  the  probability 
on  this  evidence  of  «-,#.,  jointly  is  the  product  of  the  probabilities 
of  «!  and  «2  separately.  It  is  difficult  to  apply  mathematical 
reasoning  to  the  Calculus  of  Probabilities  unless  this  condition 
is  fulfilled  ;  and  the  fulfilment  of  the  condition  has  often  been 
assumed  too  lightly.  A  good  many  of  the  most  misleading 
fallacies  in  the  theory  of  Probability  have  been  due  to  a  use  of 
the  Multiplication  Theorem  in  its  simplified  form  in  cases  where 
this  is  illegitimate. 

2.  These  fallacies  have  been  partly  due  to  the  absence  of 
a  clear  understanding  as  to  what  is  meant  by  Independence. 
Students  of  Probability  have  thought  of  the  independence  of 
events,  rather  than  of  the  independence  of  arguments  or  pro 
positions.  The  one  phraseology  is,  perhaps,  as  legitimate  as  the 
other  ;  but  when  we  speak  of  the  dependence  of  events,  we  are 
led  to  believe  that  the  question  is  one  of  direct  causal  dependence, 
two  events  being  dependent  if  the  occurrence  of  one  is  a  part 
cause  or  a  possible  part  cause  of  the  occurrence  of  the  other.  In 
this  sense  the  result  of  tossing  a  coin  is  dependent  on  the  existence 
of  bias  in  the  coin  or  in  the  method  of  tossing  it,  but  it  is  inde 
pendent  of  the  actual  results  of  other  tosses  ;  immunity  from 
smallpox  is  dependent  on  vaccination,  but  is  independent  of 
statistical  returns  relating  to  immunity  ;  while  the  testimonies 
of  two  witnesses  about  the  same  occurrence  are  independent, 
so  long  as  there  is  no  collusion  between  them. 
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This  sense,  which  it  is  not  easy  to  define  quite  precisely,  is 
at  any  rate  not  the  sense  with  which  we  are  concerned  when  we 
deal  with  independent  probabilities.  We  are  concerned,  not  with 
direct  causation  of  the  kind  described  above,  but  with  '  depend 
ence  for  knowledge,'  with  the  question  whether  the  knowledge  of 
one  fact  or  event  affords  any  rational  ground  for  expecting  the 
existence  of  the  other.  The  dependence  for  knowledge  of  two 
events  usually  arises,  no  doubt,  out  of  causal  connection,  or  what 
we  term  such,  of  some  kind.  But  two  events  are  not  independent 
for  knowledge  merely  because  there  is  an  absence  of  direct  causal 
connection  between  them  ;  nor,  on  the  other  hand,  are  they 
necessarily  dependent  because  there  is  in  fact  a  causal  train  which 
brings  them  into  an  indirect  connection.  The  question  is  whether 
there  is  any  known  probable  connection,  direct  or  indirect.  A 
knowledge  of  the  results  of  other  tossings  of  a  coin  may  be  hardly 
less  relevant  than  a  knowledge  of  the  bias  of  the  coin  ;  for  a 
knowledge  of  these  results  may  be  a  ground  for  a  probable  know 
ledge  of  the  bias.  There  is  a  similar  connection  between  the 
statistics  of  immunity  from  smallpox  and  the  causal  relations 
between  vaccination  and  smallpox.  The  truthful  testimonies 
of  two  witnesses  about  the  same  occurrence  have  a  common 
cause,  namely  the  occurrence,  however  independent  (in  the  legal 
sense  of  the  absence  of  collusion)  the  witnesses  may  be.  For  the 
purposes  of  probability  two  facts  are  only  independent  if  the 
existence  of  one  is  no  indication  of  anything  which  might  be  a 
part  cause  of  the  other. 

3.  While  dependence  and  independence  may  be  thus  con 
nected  with  the  conception  of  causality,  it  is  not  convenient  to 
found  our  definition  of  independence  upon  this  connection.  A 
partial  or  possible  cause  involves  ideas  wliich  are  still  obscure,  and 
I  have  preferred  to  define  independence  by  reference  to  the  con 
ception  of  relevance,  which  has  been  already  discussed.  Whether 
there  really  are  material  external  causal  laws,  how  far  causal 
connection  is  distinct  from  logical  connection,  and  other  such 
questions,  are  profoundly  associated  with  the  ultimate  problems 
of  logic-  and  probability  and  with  munv  of  the  topics,  especially 
those  of  Part  III.,  of  this  treatise.  Hut  1  have  nothing  useful  to 
say  about  them.  Nearly  everything  with  which  J  deal  can  be 
expressed  in  terms  of  logical  relevance.  And  the  relations  be 
tween  logical  relevance  and  material  cause  must  be  left  doubtful. 
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4.  It  will  be  useful  to  give,  a  few  examples  out  of  writers  who, 
as  I  conceive,  have  been  led  into  mistakes  through  misappre 
hending  the  significance  of  Independence. 

Cournot,1  in  his  work  on  Probability,  which  after  a  long  period 
of  neglect  has  come  into  high  favour  with  a  modern  school  of 
thought  in  France,  distinguishes  between  '  subjective  probability  ' 
based  on  ignorance  and  '  objective  probability  '  based  on  the 
calculation  of  '  objective  possibilities,'  an  '  objective  possibility  ' 
being  a  chance  event  brought  about  by  the  combination  or  con 
vergence  of  phenomena  belonging  to  independent  series.  The 
existence  of  objectively  chance  events  depends  on  his  doctrine 
that,  as  there  are  series  of  phenomena  causally  dependent,  so 
there  are  others  between  the  causal  developments  of  which  there 
is  independence.  These  objective  possibilities  of  Cournot's, 
whether  they  be  real  or  fantastic,  can  have,  however,  small 
importance  for  the  theory  of  probability.  For  it  is  not  known 
to  us  what  series  of  phenomena  are  thus  independent.  If  we  had 
to  wait  until  we  knew  phenomena  to  be  independent  in  this  sense 
before  we  could  use  the  simplified  multiplication  theorem,  most 
mathematical  applications  of  probability  would  remain  hypo 
thetical. 

5.  Cournot's  '  objective  probability,'  depending  wholly  on 
objective  fact,  bears  some  resemblances  to  the  conception  in  the 
minds  of  those  who  adopt  the  frequency  theory  of  probability. 
The  proper  definition  of  independence  on  this  theory  has  been 
given  most  clearly  by  Mr.  Yule  2  as  follows  : 

"  Two  attributes  A  and  B  are  usually  defined  to  be  inde 
pendent,  within  any  given  field  of  observation  or  '  universe,' 
when  the  chance  of  finding  them  together  is  the  product  of  the 
chances  of  finding  either  of  them  separately.  The  physical 
meaning  of  the  definition  seems  rather  clearer  in  a  different 
form  of  statement,  viz.  if  we  define  A  and  B  to  be  independent 
when  the  proportion  of  A's  amongst  the  B's  of  the  given  universe  is 
the  same  as  in  that  universe  at  large.  If,  for  instance,  the  question 
were  put,  '  What  is  the  test  for  independence  of  smallpox  attack 
and  vaccination  ?  '  the  natural  reply  would  be,  '  The  percentage 
of  vaccinated  amongst  the  attacked  should  be  the  same  as  in 
the  general  population.'  ..." 

1  For  some  account  of  Cournot,  see  Chapter  XXIV.  §  3. 

2  "  Notes  on  the  Theory  of  Association  of  Attributes  in  Statistics,"  Bio- 
rnetrika,  vol.  ii.  p.  125. 
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This  definition  is  consistent  with  the  rest  of  the  theory 
to  which  it  belongs,  but  is,  at  the  same  time,  open  to  the 
general  objections  to  it.1  Mr.  Yule  admits  that  A  and  B  may  be 
independent  in  the  world  at  large  but  not  in  the  world  of  C's. 
The  question  therefore  arises  as  to  what  world  given  evidence 
specifies,  and  whether  any  step  forward  is  possible  when,  as  is 
generally  the  case,  we  do  not  know  for  certain  what  the  propor 
tions  in  a  given  world  actually  are.  As  in  the  case  of  Cournot's 
independent  series,  it  is  in  general  impossible  that  we  should 
know  whether  A  and  B  are  or  are  not  independent  in  this  sense. 
The  logical  independence  for  knowledge  which  justifies  our 
reasoning  in  a  certain  way  must  be  something  different  from 
either  of  these  objective  forms  of  independence. 

6.  T  com*1  now  to  Boole's  treatment  of  this  subject.  The 
central  error  in  his  system  of  probability  arises  out  of  his  giving 
two  inconsistent  definitions  of  '  independence.'  2  He  first  wins 
the  reader's  acquiescence  by  giving  a  perfectly  correct  defini 
tion  :  "  Two  events  are  said  to  be  independent  when  the 
probability  of  the  happening  of  either  of  them  is  unaffected  by 
our  expectation  of  the  occurrence  or  failure  of  the  other."  3  But 
a  moment  later  he  interprets  the  term  in  quite  a  different  sense  ; 
for,  according  to  Boole's  second  definition,  we  must  regard  the 
events  as  independent  unless  we  are  told  either  that  they  must 
concur  or  that  they  cannot  concur.  That  is  to  say,  they  are  in 
dependent  unless  we  know  for  certain  that  there  is,  in  fact,  an 
invariable  connection  between  them.  "  The  simple  events,  x,  y,  z. 
will  be  said  to  be  conditioned  when  they  are  not  free  to  occur  in 
every  possible  combination  ;  in  other  words,  when  some  com 
pound  event  depending  upon  them  is  precluded  from  occurring. 

1  See  Chapter  VIII. 

2  Boole's  mistake  was  pointed  out,  accurately  though  somewhat  obscurely, 
by  U.  U'ilbraham  in  his  review  "On  the  Theory  of  Chances  developed  in  Professor 
Boole's    JMU-II   oj    Thought"    (Phil.    May.    -1th    s.-ries,   vol.    vii.,    1854).       Book- 
failed  to  understand  the  point  of  Wilbraham's  criticism,  and  replied  hotly, 
challenging  him  to  impugn  any  individual  results  ("  Reply  to  some  Observations 
published   by   Mr.    Wilbraham,"   1'hil.   Mn<j.  4th   scries,'  vol.    viii..  1S.VI).      II.- 
returned  to  the  same  question   in  a   paper  entitled   "On  a  (Jem-ral  Method  in 
the  Theory  of   Probabilities,"  Phil.  Mag.  4th  scries,  vol.  \  Hi.,  1854,  where  he 
endeavours  to  support  his  theorv  by  an  apjH-al  to  the  Principle  of  Indifference. 
McColl,    in    his   "Sixth    Paprr  on   Calculus   of    Kquivalcnt    Statements,"    saw 
that   Boole's   fallacy    turned    on    his    definition    of    Independence  ;     but    I    do 
not  think  he  understood,  at  least  he  does  not  explain,  where  precisely  Boole's 
mistake  lay. 

3  Laws  of  Thought,  p.  255.     The  italics  in  this  quotation  are  mine. 
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.  .  .  Simple  unconditioned  events  are  by  definition  independent/' l 
In  fact  as  long  as  xz  is  possible,  x  and  z  are  independent.  This  is 
plainly  inconsistent  with  Boole's  first  definition,  with  which  he 
makes  no  attempt  to  reconcile  it.  The  consequences  of  his  em 
ploying  the  term  independence  in  a  double  sense  are  far-reaching. 
For  he  uses  a  method  of  reduction  which  is  only  valid  when  the 
arguments  to  which  it  is  applied  are  independent  in  the  first 
sense,  and  assumes  that  it  is  valid  if  they  are  independent  in  the 
second  sense.  While  his  theorems  are  true  if  all  the  propositions 
or  events  involved  are  independent  in  the  first  sense,  they  are  not 
true,  as  he  supposes  them  to  be,  if  the  events  are  independent 
only  in  the  second  sense.  In  some  cases  this  mistake  involves 
him  in  results  so  paradoxical  that  they  might  have  led  him 
to  detect  his  fundamental  error.2  Boole  was  almost  certainly 
led  into  this  error  through  supposing  that  the  data  of  a 
problem  can  be  of  the  form,  "  Prob.  x  =  p,"  i.e.  that  it  is 
sufficient  to  state  that  the  probability  of  a  proposition  is  such 
and  such,  without  stating  to  what  premisses  this  probability  is 
referred.3 

It  is  interesting  that  De  Morgan  should  have  given, 
incidentally,  a  definition  of  independence  almost  identical 
with  Boole's  second  definition  :  "  Two  events  are  independent 
if  the  latter  might  have  existed  without  the  former,  or  the 

1  Op.  dt.  p.  258. 

2  There  is  an  excellent  instance  of  this,  Laws  of  Thought,  p.  286.     Boole 
discusses  the  problem  :    Given  the  probability  p  of  the  disjunction  '  either  Y 
is  true,  or  X  and  Y  are  false,'  required  the  probability  of  the  conditional  pro 
position,  '  If  X  is  true,  Y  is  true.'    The  two  propositions  are  formally  equivalent ; 

but  Boole,  through  the  error  pointed  out  above,  arrives  at  the  result        CP 

l-p  +  cp 

where  c  is  the  probability  of  '  If  either  Y  is  true,  or  X  and  Y  false,  X  is  true.' 
His  explanation  of  the  paradox  amounts  to  an  assertion  that,  so  long  as  two 
propositions,  which  are  formally  equivalent  when  true,  are  only  probable,  they 
;iro  not  necessarily  equivalent. 

3  In  studying  and  criticising  Boole's  work  on   Probability,  it  is  very  im 
portant  to  take  into  account  the  various  articles  which  he  contributed  to  the 
Philosophical  Magazine  during  1854,  in  which  the  methods  of  The  Laws  oj 
Thought  are  considerably  improved  and  modified.     His  last  and  most  considered 
contribution  to  Probability  is  his  paper  "  On  the  application  of  the  Theory  of 
Probabilities  to  the  question  of  the  combination  of  testimonies  or  judgments," 
to  be  found  in  the  Edin.  Phil.  Trans,  vol.  xxi.,  1857.      This  memoir  contains  a 
simplification  and  general  summary  of  the  method  originally  proposed  in  The 
Laws  of  Thought,  and  should  be  regarded  as  superseding  the  exposition  of  that 
book.     In  spite  of  the  error  already  alluded  to,  which  vitiates  many  of  his 
conclusions,  the  memoir  is  as  full  as  are  his  other  writings   of   genius  and 
originality. 
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former  without  the  latter,  for  anything  that  we  know  to  the 
contrary."  1 

7.  In  many  other  cases  errors  have  arisen,  not  through  a 
misapprehension  of  the  meaning  of  independence,  but  merely 
through  careless  assumptions  of  it,  or  through  enunciating  the 
Theorem  of  Multiplication  without  its  qualifying  condition. 
Mathematicians  have  been  too  eager  to  assume  the  legitimacy 
of  those  complicated  processes  of  multiplying  probabilities,  for 
which  the  greater  part  of  the  mathematics  of  probability  is 
engaged  in  supplying  simplifications  and  approximate  solutions. 
Even  De  Morgan  was  careless  enough  in  one  of  his  writings  2 
to  enunciate  the  Multiplication  Theorem  in  the  following  form  : 
;'  The  probability  of  the  happening  of  two,  three,  or  more  events 
is  the  product  of  the  probabilities  of  their  happening  separately 
(p.  398).  .  .  .  Knowing  the  probability  of  a  compound  event, 
and  that  of  one  of  its  components,  we  iind  the  probability 
of  the  other  by  dividing  the  first  by  the  second.  This  is  a 
mathematical  result  of  the  last  too  obvious  to  require  further 
proof  (p.  401).' 

An  excellent  and  classic  instance  of  the  danger  of  wrongful 
assumptions  of  independence  is  given  by  the  problem  of  deter 
mining  the  probability  of  throwing  heads  twice  in  two  consecutive 
tosses  of  a  coin.  The  plain  man  generally  assumes  without 
hesitation  that  the  chance  is  (J)2.  For  the  d  priori  chance  of 
heads  at  the  first  toss  is  *,,  and  we  might  natural! v  suppose  that 
the  two  events  are  independent,— since  the  mere  fact  of  heads 
having  appeared  once,  can  have  no  influence  on  the  next  toss. 
But  this  is  not  the  case  unless  we  know  for  certain  that  the  coin 
is  free  from  bias.  If  we  do  not  know  whether  there  is  biaf,  or 
which  way  the  bias  lies,  then  it  is  reasonable  to  put  the  probability 
somewhat  higher  than  (J)'-.  The  fact  of  heads  having  appeared 
at  the  first  toss  is  not  the  cause  of  heads  appearing  at  the  second 
also,  but  the  knowledge,  that  the  coin  has  fallen  heads  already, 
a  fleets  our  forecast  of  its  falling  thus  in  the  future,  since  heads  in 
the  past  may  have  been  due  to  a  cause  which  will  favour  heads 
in  the  future.  The  possibility  of  bias  in  a  coin,  it  may  he  noticed, 

1  "  Kssay  on  Probabilities"  in  tho  Cabinet  Encyclopaedia,  p.  26.    DC  Morgan 
is  not  very  consistent  with   himself  in   his   various  distinct  treatises  on   this 
subject,  and  other  definitions  mav  be  found  elsewhere.      Hoole's  second  defini 
tion  of  Independence  is  also  adopted  by  Macfarlane,  Algebra  oj  /.oyir.  p.  HI. 

2  "  Theory  of  Probabilities  "  in  the  Encyclopaedia  Meiropolilnna. 
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always  favours  '  runs  "  ;  this  possibility  increases  the  probability 
both  of  *  runs  '  of  heads  and  of '  runs  '  of  tails. 

This  point  is  discussed  at  some  length  in  Chapter  XXIX.  and 
further  examples  will  be  given  there.  In  this  chapter,  therefore, 
I  will  do  more  than  refer  to  an  investigation  by  Laplace  and  to 
one  real  and  one  supposed  fallacy  of  Independence  of  a  type  with 
which  we  shall  not  be  concerned  in  Chapter  XXIX. 

8.  Laplace,  in  so  far  as  he  took  account  at  all  of  the  considera 
tions  explained  in  §  7,  discussed  them  under  the  heading  of  Des 
inegalites  inconnues  qui  peuvent  exister  entre  les  chances  que  Von 
suppose  egales.1  In  the  case,  that  is  to  say,  of  the  coin  with 
unknown  bias,  he  held  that  the  true  probability  of  heads  even 
at  the  first  toss  differed  from  J  by  an  amount  unknown.  But 
this  is  not  the  correct  way  of  looking  at  the  matter.  In  the 
supposed  circumstances  the  initial  chances  for  heads  and  tails 
respectively  at  the  first  toss  really  are  equal.  What  is  not  true 
is  that  the  initial  probability  of  '  heads  twice  '  is  equal  to  the 
probability  of  '  heads  once  '  squared. 

Let  us  write  '  heads  at  first  toss  '  =  hl'J  '  heads  at  second  toss  ' 
=  fla.  Then  hjh  =  h?/h  =  $,  and  hjidh~1i.jhji.hjh.  Hence 
hji<jh  =  {hjh}*  only  if  h2/hji=h^/h,  i.e.  if  the  knowledge  that 
heads  has  fallen  at  the  first  toss  does  not  affect  in  the  least  the 
probability  of  its  falling  at  the  second.  In  general,  it  is  true  that 
hjh-ji  will  not  differ  greatly  from  hjh  (for  relative  to  most  hypo 
theses  heads  at  the  first  toss  will  not  much  influence  our  expectation 
of  heads  at  the  second),  and  J  will,  therefore,  give  a  good  approxi 
mation  to  the  required  probability.  Laplace  suggests  an  ingeni 
ous  method  by  which  the  divergence  may  be  diminished.  If  we 
throw  two  coins  and  define  '  heads  '  at  any  toss  as  the  face  thrown 
by  the  second  coin,  he  discusses  the  probability  of  '  heads  twice 
running '  with  the  first  coin.  The  solution  of  this  problem 
involves,  of  course,  particular  assumptions,  but  they  are  of  a  kind 
more  likely  to  be  realised  in  practice  than  the  complete  absence 
of  bias.  As  Laplace  does  not  state  them,  and  as  his  proof  is 
incomplete,  it  may  be  worth  while  to  give  a  proof  in  detail. 

Let  hly  tlt  A2,  t2  denote  heads  and  tails  respectively  with 
the  first  and  second  coins  respectively  at  the  first  toss,  and 
///,  t^,  h./,  t.2'  the  corresponding  events  at  the  second  toss,  then 

1  Essai  philosophique,  p.  49.  See  also  "  Memoire  sur  les  Probabilites,"  Mem. 
de  VAcad.  p.  228,  and  cp.  D'Alembert,  "  Sur  le  calcul  dos  probability " 
Opuscules  mathemaliques  (1780),  vol.  vii. 
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the  probability  (with  the  above  convention)  of  '  heads  twice  run 
ning,'  i.e.  agreement  between  the  two  coins  twice  running,  is 

•  l/'i/'i'  +  Vi')//'  • 

Since  /'2/'27(^V'i' ~{~  Vi'»  M  :   ^2f2/(^i^i  +  ^1^1''  ^')  by  the  Principle 
of  Indifference,  and  /'2//2'V27/'     °' 


Similarly  (V'-i'-'  Vi'V^-V'i'/''- 

\Ve  may  assume  that  h1/hl'h  =  h1lh,  i.e.  that  heads  with  one 
coin  is  irrelevant  to  the  probability  of  heads  with  the  other  ;  and 
hjh  =111/11=1  by  the  Principle  of  Indifference,  so  that 

(//!///  +  Vi')M=2(J)»  =  J. 

•••  (/'A'  +  ^Wi'-*  Vi')/7'    ^VAW  •»  -'i'i'>>'-).  i 

//2//27(Vi'  -V/,  //) 


since,  (//!/?/  +V/)  t>emg  irrelevant  to  A'2//i,  ^'u/WV  +^i'i''  ^)  =" 

*,'/*-!. 

Now  h2/(h2',  hjii  -i-/^/;  /O  is  greater  than  £,  since  the  fact  of 
the  coins  having  agreed  once  may  be  som€  reason  for  supposing 
they  will  agree  again.  But  it  is  less  than  h^hji  :  for  wre  may 
assume  that  1i2!(h2,  />i^]'+Vi'>  ^)  ^s  ^ess  than  hj(h2  ^  h^it  '')? 
and  also  that  /<2/(V>  ^iV»  h)=h2/hlh,  i.e.  that  heads  twice 
running  with  one  coin  does  not  increase  the  probability  of  heads 
twice  running  with  a  different  coin.  Laplace's  method  of  tossing, 
therefore,  yields  with  these  assumptions,  more  or  less  legitimate 
according  to  the  content  of  h,  a  probability  nearer  to  £  than  is 
hjJi2lh.  if  /*2/(V,  W+Vi'»  ^)  =  ii  tn('n  tuc  l^obability  is 
exactly  J. 

9.  Two  other  examples  will  complete  this  rather  discursive 
commentary.  Tt  has  boon  supposed  that  by  the  Principle  of 
Indifference  the  probability  of  the  existence  of  iron  upon  Sirius 
is  I,  and  that  similarly  the  probability  of  the  existence  th<>re  of 
any  other  element  is  also  I.  The  probability,  therefore,  that 
not  one  of  the  08  terrestrial  elements  will  be  found  on  Sirius 
is  (i)r>s,  and  that  at  least  one  will  be  found  there  is  1  (.J)'1^  <>r 
approximately  certain.  This  argument,  or  a  similar  OIK*,  has 
been  seriously  advanced,  it  would  seem  to  prove  also,  amongst 
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many  other  things,  that  at  least  one  college  exactly  resembling 
some  college  at  either  Oxford  or  Cambridge  will  almost  certainly 
be  found  on  Sirius.  The  fallacy  is  partly  due.  as  has  been  pointed 
out  by  Von  Kries  and  others,  to  an  illegitimate  use  of  the  Principle 
of  Indifference.  The  probability  of  iron  on  Sirius  is  not  I.  But 
the  result  is  also  due  to  the  fallacy  of  false  independence. 
It  is  assumed  that  the  known  existence  of  67  terrestrial 
elements  on  Sirius  would  not  increase  the  probability  of  the 
sixty-eighth's  being  found  there  also;  and  that  their  known 
absence  would  not  decrease  the  sixty-eighth's  probability.1 

10.  The  other  example  is  that  of  Maxwell's  classic  mistake  in 
the  theory  of  gases.2  According  to  this  theory  molecules  of  gas 
move  with  great  velocity  in  every  direction.  Both  the  directions 
and  velocities  are  unknown,  but  the  probability  that  a  molecule 
has  a  given  velocity  is  a  function  of  that  velocity  and  is  inde 
pendent  of  the  direction.  The  maximum  velocity  and  the  mean 
velocity  vary  with  the  temperature.  Maxwell  seeks  to 
determine,  on  these  conditions  alone,  the  probability  that  a 
molecule  has  a  given  velocity.  His  argument  is  as  follows  : 

If  (f)(x)  represents  the  probability  that  the  component  of 
velocity  parallel  to  the  axis  of  X  is  a,  the  probability  that  the 
velocity  has  components  x,  y,  z  parallel  to  the  three  axes  is 
<t>(x)<t>(y)4>(z)'  Thus  if  F(?;)  represents  the  probability  of  a  total 
velocity  v,  we  have  $(#)<£ (i/)0(z)  =  F(?;),  where  v2  =  x2  +  y2  +  z2. 
It  is  not  difficult  to  deduce  from  this  (assuming  that  the 

1  See  Von  Ivries,  Die  Principles  der  Wahrscheinlichkeitsrechnung,  p.  10. 
Stumpf  (Uber  den  Begriff  der  mathcnt.  Wahrscheinlichkcit,  pp.  71-74)  argues  that 
the  fallacy  results  from  not  taking  into  account  the  fact  that  there  might  be  as 

many  metals  as  atomic  weights,  and  that  therefore  the  chance  of  iron  is    ,  where 

z  is  the  number  of  possible  atomic  weights.  A.  Nitsche  ( Vicrteljsch.  f.  wissensch. 
Philos.,  1892)  thinks  that  the  real  alternatives  are  0,  or  only  1,  or  only  2  ...  or 
08  terrestrial  elements  on  Sirius,  and  that  these  are  equally  probable,  the  chance 

of  each  being  ^ 

2  I  take  the  statement  of  this  from  Bertrand's  Calcul  des  probability,  p.  30. 
Let  me  here  quote  a  precocious  passage  on  Probability  regarded  as  a  branch  of 
Logic,  from  a  letter  written  by  Maxwell  in  his  nineteenth  year  (1850),  before 
he  came  up  to  Cambridge :  "  They  say  that  Understanding  ought  to  work 
by  the  rules  of  right  reason.  These  rules  are,  or  ought  to  be,  contained  in 
Logic  ;  but  the  actual  science  of  logic  is  conversant  at  present  only  with  things 
either  certain,  impossible,  or  entirely  doubtful,  none  of  which  (fortunately) 
we  have  to  reason  on.  Therefore  the  true  logic  for  this  world  is  the  calculus 
of  Probabilities,  which  takes  account  of  the  magnitude  of  the  probability 
which  is,  or  ought  to  be,  in  a  reasonable  man's  mind  "  (Life,  page  143). 
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functions    are    analytical)    that    </>(*)    must    be    of    the    form 

It  is  generally  agreed  at  the  present  time  that  this  result  is 
erroneous.  But  the  nature  of  the  error  is,  1  think,  quite  different 
from  what  it  is  commonly  supposed  to  be. 

Bertrand,1  Poincare.2  and  Von  Kries,3  all  cite  this  argument  of 
Maxwell's  as  an  illustration  of  the  fallacy  of  Independence  ;   and 
argue  that  </>(:r),  (/>(?/),  and  <£(z)  cannot,  as  he  assumes,  represent 
independent  probabilities,  if,  as  he  also  assumes,  the  probability 
of  a  velocity  is  a  function  of  that  velocity.     But  it  is  not  in  this 
way  that  the  error  in  the  result  really  arises.     If  we  do  not  know 
ivhat  function  of  the  velocity  the  probability  of  that  velocity  is, 
a  knowledge  of  the  velocity  parallel  to  the  axes  of  x  and  y  tells 
us  nothing  about  the  velocity  parallel  to  the  axis  of  z.     Maxwell 
was,  I  think,  quite  right  to  hold  that  a  mere  assumption  that  the 
probability  of  a  velocity  is  some  function  of  that  velocity,  does 
not  interfere  with  the  mutual  independence  of  statements  as  to 
the  velocity  parallel  to  each  of  the  three  axes.     Let  us  denote 
the  proposition,  '  the  velocity  parallel  to  the  axis  of  X  is  x  '  by 
X(x),  the  corresponding  propositions  relative  to  the  axes  of  Y 
and    Z    by    V(//)    and    Z(z),    and    the    proposition    '  the    total 
velocity  is  v  '  by  V(r)  ;    and  let  h  represent  our  d  priori  data. 
Then    if   X(x)/h^(f)(x)    it   is    a    justifiable    inference    from    the 
Principle   of    Indifference    that    Y (//)//<-</>(?/)   and   Z(z}/h=<f>(z). 
Maxwell    infers    from    this    that    X(;r)Y(iy)Z(z)//>.  =  (£(o-)0(?/)<£(2). 
That   is    to    say,    he    assumes    that    \(y)/X(x)  .  h  =  Y(/y)/A    and 
that    Z(z)/\(y)  .  X(x)  .  h  =  Z(z)/h.      I    do   not   agree    with    the 
authorities  cited  above   that  this  is  illegitimate.     So  long  as 
we  do  not  know  what  function  of  the  total  velocity  the  prob 
ability  of  that  velocity  is,  a  knowledge  of  the  velocities  parallel 
to  the  axes  of  x  and  y  has  no  bearing  on  the  probability  of  a  given 
velocity  parallel  to  the  axis  of  z.     But  Maxwell  goes  on  to  infer 
that  X(x)Y(y)Z(z)lh=V(v)/h,  where  t'2  =  x2  +  y2  +>.     It  is  here, 
and  in  a  very  elementary  way.  that  the  error  creeps  in.     The 
propositions   X(x)\ (y)Z(z)   and    V(v)   are   not   equivalent.     The 
latter  follows  from  the  former,  but  the  former  does  not  follow 
from  the  latter.     There  is  more  than  one  set  of  values  JT,  ?/,  z, 

1  ('dlcul  des  probabilitis,  p.  .'{(>. 

2  Calc.nl  de*  probability  (2nd  <•<!.),  pp.  41    II. 

3  \Y «lir«rhcinlichkp.it«rcchminrj,  p.   100. 
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which  will  yield  the  same  value  v.  Thus  the  probability  V(v)/h 
is  much  greater  than  the  probability  X.(xjY(y)Z(z)/h.  As  we  do 
not  know  the  direction  of  the  total  velocity  v,  there  are  many 
ways,  not  inconsistent  with  our  data,  of  resolving  it  into  com 
ponents  parallel  to  the  axes.  Indeed  I  think  it  is  a  legitimate 
extension  of  the  preceding  argument  to  put  V(v)/h  =  (p(v) ',  for 
there  is  no  reason  for  thinking  differently  about  the  direction 
V  from  what  we  think  about  the  direction  X. 

A  difficulty  analogous  to  this  occurs  in  discussing  the  problem 
of  the  dispersion  of  bullets  over  a  target — a  subject  round  which, 
on  account  of  a  curiosity  which  it  seems  to  have  raised  in  the 
minds  of  many  students  of  probability,  a  literature  has  grown  up 
of  a  bulk  disproportionate  to  its  importance. 

11.  I  now  pass  to  the  Principle  of  Inverse  Probability,  a 
theorem  of  great  importance  in  the  history  of  the  subject.  With 
various  arguments  which  have  been  based  upon  it  I  shall  deal 
in  Chapter  XXX.  But  it  will  be  convenient  to  discuss  here  the 
history  of  the  Principle  itself  and  of  attempts  at  proving  it. 

It  first  makes  its  appearance  somewhat  late  in  the  history  of 
the  subject.  Not  until  1763,  when  Bayes's  theorem  was  com 
municated  to  the  Royal  Society,1  was  a  rule  for  the  determination 
of  inverse  probabilities  explicitly  enunciated.  It  is  true  that 
solutions  to  inductive  problems  requiring  an  implicit  and  more 
or  less  fallacious  use  of  the  inverse  principle  had  already  been 
propounded,  notably  by  Daniel  Bernoulli  in  his  investigations 
into  the  statistical  evidence  in  favour  of  inoculation.2  But  the 
appearance  of  Bayes's  Memoir  marks  the  beginning  of  a  new 
stage  of  development.  It  was  followed  in  1767  by  a  contribution 
from  Michell 3  to  the  Philosophical  Transactions  on  the  distribu- 

1  Published  in  the  Phil  Trans,  vol.  liii.,  1763,  pp.  376-398.     This  Memoir 
was  communicated  by  Price  after  Bayes's  death ;  there  was  a  second  Memoir 
in  the  following  year  (vol.  liv.  pp.  298-310),  to  which  Price  himself  made  some 
contributions.     See  Todhunter's  History,  pp.  299  et  seq.      Thomas  Bayes  was 
a  dissenting  minister  of  Tun  bridge  Wells,  who  was  a  Fellow  of  the  Royal  Society 
from  1741  until  his  death  in  1761.     A  German  edition  of  his  contributions  to 
Probability  has  been  edited  by  Timerding. 

2  "  Essai  d'une  nouvelle  analyse  de  la  mortalite  causee  par  la  petite  verole, 
et  des  avantages  de  Finoculation  pour  la  prevenir,"  Hist,  de  VAcad.,  Paris,  1760 
(published   1766).     Bernoulli  argued  that  the  recorded  results  of  inoculation 
rendered  it  a  probable  cause  of  immunity.    This  is  an  inverse  argument,  though 
Bayes's  theorem  is  not  used  in  the  course  of  it.     See  also  D.  Bernoulli's  Memoir 
on  the  Inclinations  oj  the  Planetary  Orbits. 

3  Michell's  argument   owes   more,   perhaps,   to   Daniel   Bernoulli   than   to 
Bayes. 
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tion  of  the  stars,  to  which  further  reference  will  be  made  in 
Chapter  XXV.  And  in  1774  the  rule  was  clearly,  though  not 
quite  accurately,  enunciated  by  Laplace  in  his  "  Memoire  sur 
la  probabilite  des  causes  par  les  evenemens "  (Memoires 
presentes  a  V Academic  dcs  Sciences  t  vol.  vi..  1771).  He  states 
the  principle  as  follows  (p.  G23)  : 

"  Si  un  evenement  pent  etre  produit  par  un  nombre  n  de 
causes  difterentes,  les  probabilites  de  1'existence  de  ces  causes 
prises  de  1'evenement  sont  entre  elles  comme  les  probabilites  de 
1'evenement  prises  de  ces  causes  ;  et  la  probabilite  de  Texistonce 
de  chacune  d'elles  est  egale  a  la  probabilite  de  I'evenement  prise 
de  cette  cause,  divisee  par  la  somme  de  toutes  les  probabilites 
de  1'evenement  prises  de  chacune  de  ces  causes." 

He  speaks  as  if  he  intended  to  prove  this  principle,  but  he  only 
give  explanations  and  instances  without  proof.  The  principle  is 
not  strictly  true  in  the  form  in  which  he  enunciates  it,  as  will  be 
seen  on  reference  to  theorems  (38)  of  Chapter  XIV.  ;  and  the 
omission  of  the  necessary  qualification  has  led  to  a  number  of 
fallacious  arguments,  some  of  which  will  be  considered  in  Chapter 
XXX. 

12.  The  value  and  originality  of  Hayes's  Memoir  are  con 
siderable,  and  Laplace's  method  probably  owes  much  more  to 
it  than  is  generally  recognised  or  than  was  acknowledged  by 
Laplace.  The  principle,  often  called  by  Hayes's  name,  does  not 
appear  in  his  Memoir  in  the  shape  given  it  by  Laplace  and 
usually  adopted  since  ;  but  Bayes's  enunciation  is  strictly  correct 
and  his  method  of  arriving  at  it  shows  its  true  logical  connection 
with  more  fundamental  principles,  whereas  Laplace's  enuncia 
tion  gives  it  the  appearance  of  a  new  principle  specially  introduced 
for  the  solution  of  causal  problems.  The  following  passage  l 
gives,  in  my  opinion,  a  right  method  of  approaching  the 
problem  :  "  If  there  be  two  subsequent  events,  the  probability 

of  the  second       and  the  probability  of  both  together      ,  and.  it 

being  first  discovered  that  the  second  event  has  happened,  from 
hence  I  guess  that  the  first  event  has  also  happened,  the  prob- 

|> 
ability  1  am  in  the  right  is     ."     If  the  occurrence  of  the  first  event 

h 

1  Quoted  l>y  Todhuntor,  op.  cit.  p.  29(5.  Todhunter  underrate*  the  import 
ance  of  this  passage,  which  he  finds  unoriginal,  yot  obscure. 
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is  denoted  by  a  and  of  the  second  by  b,  this  corresponds  to 
ab/h  =  a/bh  .  b/h  and  therefore  a/bh  =  ;  for  ab/h  =  ^,  b/h  =  , 

p 

a/bh  =    .     The  direct  and  indeed  fundamental  dependence  of  the 
b 

inverse  principle  on  the  rule  for  compound  probabilities  was  not- 
appreciated  by  Laplace. 

13.  A  number  of  proofs  of  the  theorem  have  been  attempted 
since  Laplace's  time,  but  most  of  them  are  not  very  satisfactory, 
and  are  generally  couched  in  such  a  form  that  they  do  no  more 
than  recommend  the  plausibility  of  their  thesis.  Mr.  McColl1  gave 
a  symbolic  proof,  closely  resembling  theorem  (38)  when  differ 
ences  of  symbolism  are  allowed  for  ;  and  a  very  similar  proof 
has  also  been  given  by  A.  A.  MarkofF.2  I  am  not  acquainted  with 
any  other  rigorous  discussion  of  it. 

Von  Kries  3  presents  the  most  interesting  and  careful  example 
of  a  type  of  proof  which  has  been  put  forward  in  one  shape  or 
another  by  a  number  of  writers.  We  have  initially,  according  to 
this  view,  a  certain  number  of  hypothetical  possibilities,  all 
equally  probable,  some  favourable  and  some  unfavourable  to  our 
conclusion.  Experience,  or  rather  knowledge  that  the  event 
has  happened,  rules  out  a  number  of  these  alternatives,  and  we 
are  left  with  a  field  of  possibilities  narrower  than  that  with  which 
we  started.  Only  part  of  the  original  field  or  Spielraum  of 
possibility  is  now  admissible  (zuldssig).  Causes  have  a  posteriori 
probabilities  which  are  proportional  to  the  extent  of  their  occur 
rence  in  the  now  restricted  field  of  possibility. 

There  is  much  in  this  which  seems  to  be  true,  but  it  hardly 
amounts  to  a  proof.  The  whole  discussion  is  in  reality  an 
appeal  to  intuition.  For  how  do  we  know  that  the  possibilities 
admissible  a  posteriori  are  still,  as  they  were  assumed  to  be  a 
priori,  equal  possibilities  ?  Von  Kries  himself  notices  that  there 
is  a  difficulty  ;  and  I  do  not  see  how  he  is  to  avoid  it,  except  by 
the  introduction  of  an  axiom. 

This  was  in  fact  the  course  taken  by  Professor  Donkin  in  1851, 
in  an  article  which  aroused  some  interest  in  the  Philosophical 

1  "Sixth  Paper  on  the  Calculus  of  Equivalent  Statements,''  Proc.  Land. 
Math.  Soc.,  1897,  vol.  xxviii.  p.  5(>7.     See  also  p.  155  above. 

2  Wahrscheinlichkeitxrechnung,  p.  178. 

3  Die  Principien  der  Wahrscheinlichkeiterechnung,  pp.  117-121.     The  above 
account  of  Von  Kries's  argument  is  much  condensed. 
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Magazine  at  the  time,  but  which  has  since  been  forgotten. 
Donkin's  theory  is,  however,  of  considerable  interest.  He  laid 
down  as  one  of  the  fundamental  principles  of  probability  the 
following  : 1 

"  If  there  be  any  number  of  mutually  exclusive  hypotheses 
V'A  •  .  .  of  which  the  probabilities  relative  to  a  particular  state 
of  information  are  p^p^  .  .  .,  and  if  new  information  be  gained 
which  changes  the  probabilities  of  some  of  them,  suppose  of 

,  and  all  that  follow,  without  having  otherwise  any  reference 
to  the  rest,  then  the  probabilities  of  these  latter  have  the  same 
ratios  to  one  another,  after  the  new  information,  that  they  had 
before."  2 

Donkin  goes  on  to  say  that  the  most  important  case  is  where 
the  new  information  consists  in  the  knowledge  that  some  of  the 
hypotheses  must  be  rejected,  without  any  further  information 
as  to  those  of  the  original  set  which  are  retained.  This  is  the 
proposition  which  Von  Ivries  requires. 

As  it  stands,  the  phrase  "  without  having  otherwise  any 
reference  to  the  rest "  obviously  lacks  precision.  An  interpreta 
tion,  however,  can  be  put  upon  it,  with  which  the  principle  is 
true.  If,  given  the  old  information  and  the  truth  of  one  of  the 
hypotheses  /*,  .  .  .  //  to  the  exclusion  of  the  rest,  the  probability 
of  what  is  conveyed  by  the  new  information  is  the  same  whichever 
of  the  hypotheses  /^  ...//„  has  been  taken,  then  Donkin's 
principle  is  valid.  For  let  a  be  the.  old  information,  a'  the  new, 
and  let  h,./a  =p,,  h,./aa'  =p/  ;  then 


P  •      I* 
•'•    "    ~    t  >  etc->  if  "'j'' ,"  --"'//'  ",  which  is  the  condition  already 

explained. 

^  14.  Difficulties  connected  with  the  Inverse  Principle  have 
arisen,  however,  not  so  much  in  attempts  to  prove  the  principle 
as  in  those  to  enunciate  it  -though  it  may  have  been  the  lack 

1  "  On  certain  Questions  relating  to  the  Theorv  of  Probabilities,"  Phil    Mnn 
4th  *eri<-H,  vol.  i.,  1851. 

It  is  interesting  to  notice  that  an  axiom,  practically  equivalent  to  thin, 
has  been  laid  down  more  lately  by  A.  A.  Markotf  (Wahr«cheinlichkeit*rtchnun*)t 
p.  8)  under  the  title  '  Un.ibhangigkcitsaxiom.' 
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of  a  rigorous  proof  that  has  been  responsible  for  the  frequent 
enunciation  of  an  inaccurate  principle. 

It  will  be  noticed  that  in  the  formula  (38-2)  the  a  priori 
probabilities  of  the  hypotheses  ax  and  a2  drop  out  if  p1  =p2,  and 
the  results  can  then  be  expressed  in  a  much  simpler  shape.  This 
is  the  shape  in  which  the  principle  is  enunciated  by  Laplace  for 
the  general  case,1  and  represents  the  uninstructed  view  expressed 
with  great  clearness  by  De  Morgan  :  2  "  Causes  are  likely  or  un 
likely,  just  in  the  same  proportion  that  it  is  likely  or  unlikely 
that  observed  events  should  follow  from  them.  The  most 
probable  cause  is  that  from  which  the  observed  event  could  most 
easily  have  arisen."  If  this  were  true  the  principle  of  Inverse 
Probability  would  certainly  be  a  most  powerful  weapon  of  proof, 
even  equal,  perhaps,  to  the  heavy  burdens  which  have  been  laid 
on  it.  But  the  proof  given  in  Chapter  XIV.  makes  plain  the 
necessity  in  general  of  taking  into  account  the  d  priori  prob 
abilities  of  the  possible  causes.  Apart  from  formal  proof  this 
necessity  commends  itself  to  careful  reflection.  If  a  cause  is 
very  improbable  in  itself,  the  occurrence  of  an  event,  which 
might  very  easily  follow  from  it,  is  not  necessarily,  so  long  as 
there  are  other  possible  causes,  strong  evidence  in  its  favour. 
Amongst  the  many  writers  who,  forgetting  the  theoretic  qualifica 
tion,  have  been  led  into  actual  error,  are  philosophers  as  diverse 
as  Laplace,  De  Morgan,  Jevons,  and  Sigwart,  Jevons 3  going 
so  far  as  to  maintain  that  the  fallacious  principle  he  enunciates 
is  "  that  which  common  sense  leads  us  to  adopt  almost  in 
stinctively/' 

15.  The  theory  of  the  combination  of  premisses  dealt  with 
in  §§  7,  8  of  Chapter  XIV.  has  not  often  been  discussed,  and  the 
history  of  it  is  meagre.  Archbishop  Whately4  was  led  astray 

1  See  the  passage  quoted  above,  p.  175. 

z  "Essay  on  Probabilities/'  in  the  Cabinet  Encyclopaedia,  p.  27. 

3  Principles  of  Sci  ^nce,  vol.  i.  p.  280. 

4  Logic,  8th  ed.  p.  211  :    "As  in  the  case  of  two  probable  premisses,  the 
conclusion  is  not  established  except  upon  the  supposition  of  their  being  both 
true,  so  in  the  case  of  two  distinct  and  independent  indications  of  the  truth 
of  some  proposition,  unless  both  of  them  fail,  the  proposition  must  be  true  : 
we  therefore  multiply  together  the  fractions  indicating  the  probability  of  the 
failure  of  each — the  chances  against  it — and,  the  result  being  the  total  chances 
against  the  establishment  of  the  conclusion  by  these  arguments,  this  fraction 
being  deducted  from  unity,  the  remainder  gives  the  probability  for  it.     E.g.  a 
certain  book  is  conjectured  to  be  by  such  and  such  an  author,  partly,  1st,  from 
its  resemblance  in  style  to  his  known  works ;    partly,  2nd,  from  its  being  attri- 
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by  a  superficial  error,  and  De  Moigan,  adopting  the  same  mis 
taken  rule,  pushed  it  to  the  point  of  absurdity.1  Bishop  Terrot 2 
approached  the  question  more  critically.  Boole's 3  last  and 
most  considered  contribution  to  the  subject  of  probability  dealt 
with  the  same  topic.  I  know  of  no  discussion  of  it  during  the 
past  sixty  years. 

Boole's  treatment  is  full  and  detailed.  He  states  the  problem 
as  follows  :  "  Required  the  probability  of  an  event  z,  when  two 
circumstances  x  and  y  are  known  to  be  present, — the  probability 
of  the  event  z,  when  we  know  only  of  the  existence  of  the  circum 
stances  x,  being  p,  and  the  probability,  when  we  only  know  of 
the  existence  of  y,  being  q."  4  His  solution,  however,  is  vitiated 
by  the  fundamental  error  examined  in  §  6  above.  Two  of  his 
conclusions  may  be  mentioned  for  their  plausibility,  but  neither 
is  valid. 

"  If  the  causes  in  operation,  or  the  testimonies  borne,"   he 


buted  to  him  by  some  one  likely  to  be  pretty  well  informed.  Let  the  probability 
of  the  conclusion,  as  deduced  from  one  of  these  arguments  by  itself,  be  supposed 
5 ,  and  in  the  other  case  i!  ;  then  the  opposite  probabilities  will  be  ?  and  1,  which 
multiplied  together  give  M  as  the  probability  against  the  conclusion.  .  .  ." 

The  Archbishop's  error,  in  that  a  negative  can  always  be  turned  into  an 
affirmative  bv  a  change  of  verbal  expression,  was  first  pointed  out  by  a  mere 
diocesan,  Bishop  Terrot,  in  the  Kdin.  Phil.  Trans,  vol.  xxi.  The  mistake  is  well 
explained  by  Boole  in  the  same  volume  of  the  Kdin.  Phil.  Trans.  :  "  A  confusion 
may  here  be  noted  between  the  probability  that  a  conclusion  is  proved,  and  the 
probability  in  favour  of  a  conclusion  furnished  by  evidence  which  does  not  prove 
it.  In  the  proof  and  statement  of  his  rule,  Archbishop  Whately  adopts  the 
former  view  of  the  nature  of  the  probabilities  concerned  in  the  data.  In  the 
exemplification  of  it,  he  adopts  the  latter." 

1  "Theory  of  Probabilities,"  Encyclopaedia  Meiropolilana,  p.  400.     He  shows 
by  means  of  it  that  "if  any  assertion  appear  neither  likely   nor  unlikely  in 
itself,  then  any  logical  argument  in  favour  of  it,  however  weak  the  premisses, 
makes  it  in  some  degree  more  likely  than  not — a  theorem  which  will  be  readily 
admitted    on    its    own   evidence."       He    then    gives    an    example  :    ••  <i    /iriori 
vegetation  on   the  planets  is  neither  likely  nor  unlikely  ;    suppose  argument 
from  analogy  makes  it  ,n,,  ;    then  the  total  probability  is  A  |  A  .   ,'„  or  1  ;',."      De 
Morgan  seems  to  accept  without  hesitation   the;  conclusion   to   be  derived  from 
this,  that  everything  which  is  not  impossible  is  as  probable  as  not. 

2  "  On  the  Possibility  of  combining  two  or  more  Probabilities  of  the  same 
Event,  HO  as  to  form  one  definite  Probability,"  Kdin.  J'/iil.  Tr/in*.,  IH.%,  vol.  xxi. 

3  "On  the  Application  of  the  Theorv  of  Probabilities  to  the  (Question  of  the 
Combination  of  Testimonies  or  Judgments,"  Kdin.  J'hil.  Trans.,  1857,  vol.  xxi. 

4  Ijoc.  cit.  p.  031.     Boole's  principle  (/or.  cit.  p.  liUO)  that  "  the  mean  strength 
of  any  probabilities  of  an  event  which  are  founded  upon  different  judgments 
or  observations  is  to  be  measured  by  that  supposed  probability  of  the  event 
ti  priori  which  those  judgments  or  observations  following  thcreu|>on  would  not 
tend  to  alter,"  is  not  correct  if  it  means  more  than  that  the  mean  strength  of 
2/z  and  z/y  is  to  be  measured  by  z/zy. 
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argues,  "  are,  separately,  such  as  to  leave  the  mind  in  a  state  of 
equipoise  as  respects  the  event  whose  probability  is  sought, 
united  they  will  but  produce  the  same  effect."  If,  that  is  to  say, 
a/Ax  =  J  and  a/h2  =  %,  he  concludes  that  a//^2  =  |.  The  plausi 
bility  of  this  is  superficial.  Consider,  for  example,  the  following 
instance  :  ^  =  A  is  black  and  B  is  black  or  white,  h2  =  B  is  black 
and  A  is  black  or  white,  a  =  both  A  and  B  are  black.  Boole  also 
concluded  without  valid  reason  that  ajh-Ji^  increases,  the  greater 
the  a  priori  improbability  of  the  combination  hji2, 

16.  The  theory  of  "  Testimony  "  itself,  the  theory,  that  is  to 
say,  of  the  combination  of  the  evidence  of  witnesses,  has  occupied 
so  considerable  a  space  in  the  traditional  treatment  of  Probability 
that  it  will  be  worth  while  to  examine  it  briefly.  It  may,  however, 
be  safely  said  that  the  principal  conclusions  on  the  subject  set 
out  by  Condorcet,  Laplace,  Poisson,  Cournot,  and  Boole,  are 
demonstrably  false.  The  interest  of  the  discussion  is  chiefly  due 
to  the  memory  of  these  distinguished  failures. 

It  seems  to  have  been  generally  believed  by  these  and  other 
logicians  and  mathematicians  x  that  the  probability  of  two 
witnesses  speaking  the  truth,  who  are  independent  in  the  sense 
that  there  is  no  collusion  between  them,  is  always  the  product 
of  the  probabilities  that  each  of  them  separately  will  speak  the 
truth.2  On  this  basis  conclusions  such  as  the  following,  for 
example,  are  arrived  at  : 

X  and  Y  are  independent  witnesses  (i.e.  there  is  no  collusion 
between  them).  The  probability  that  X  will  speak  the  truth  is 
x,  that  Y  will  speak  the  truth  is  y.  X  and  Y  agree  in  a  particular 
statement.  The  chance  that  this  statement  is  true  is 


For  the  chance  that  they  both  speak  the  truth  is  xy,  and  the 
chance  that  they  both  speak  falsely  is  (1  -x)(l  -y).     As,  in  this 

1  Perhaps  M.  Bertrand  should  be  registered  as  an  honourable  exception. 
At  least  he  points  out  a  precisely  analogous  fallacy  in  an  example  where  two 
meteorologists  prophesy  the  weather,  Calcul  des  Probabilites,  p.  31. 

2  E.g.,  Boole,  Laws  of  Thought,  p.  279. 

De  Morgan,  Formal  Logic,  p.  195. 
Condorcet,  Essai,  p.  4. 
Lacroix,  Traite,  p.  248. 
Cournot,  Exposition,  p.  354. 
Poisson,  Recherches,  p.  323. 
Tais  list  could  be  greatly  extended. 
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case,  our  hypothesis  is  that  they  agree,  these  two  alternatives 
are  exhaustive  ;  whence  the  above  result,  which  may  be  found 
in  almost  every  discussion  of  the  subject. 

The  fallacy  of  such  reasoning  is  easily  exposed  by  a  more 
exact  statement  of  the  problem.  For  let  ^  stand  for  "  X,  asserts 
a,"  and  let  ajnji^x^  where  /i,  our  general  data,  is  bv  itself 
irrelevant  to  a,  i.e.,  x±  is  the  probability  that  a  statement  is  true 
of  which  we  only  know  that  Xx  has  asserted  it.  Similarly  let  us 
write  b/b2h=x2,  where  b.2  stands  for  "  X2  asserts  ft."  The  above 
argument  then  assumes  that,  if  Xt  and  X2  are  witnesses  who  are 
causally  independent  in  the  sense  there  is  no  collusion  between 
them  direct  or  indirect,  ab/ufiji  =«/«.,/*  .  b/bji  =  xLx2. 

But  ab/a^h^a/a^bbji  .  b/afiji,  and  this  is  not  equal  to  x^x2 
unless  aja^bbji^ajaji  and  b/afiji^b/bji.  It  is  not  a  sufficient 
condition  for  this,  as  seems  usually  to  be  supposed,  that  X,  and  X., 
should  be  witnesses  causally  independent  of  one  another.  It  is 
also  necessary  that  a  and  6,  i.e.  the  propositions  asserted  by  the 
witnesses,  should  be  irrelevant  to  one  another  and  also  each  of 
them  irrelevant  to  the  fact  of  the  assertion  of  the  other  by  a 
witness.  If  a  knowledge  of  a  affects  the  probability  either  of 
b  or  of  ft,,  it  is  evident  that  the  formula  breaks  down.  In  the  one 
extreme  case,  where  the  assertions  of  the  two  contradict  one 
another,  ab/afi.Ji  =0.  In  the  other  extreme,  where  the  two  agree 
in  the  same  assertion,  i.e..  where  a  =  b,  njd^bbji  =  1  and  not  =  er/fl1A. 
17.  The  special  problem  of  the  agreement  of  witnesses,  who 
make  the  same  statement,  can  be  best  attacked  as  follows,  a 
certain  amount  of  simplification  being  introduced.  Let  the 
general  data  h  of  the  problem  include  the  hypothesis  that  X1  and 
X2  are  each  asked  and  reply  to  a  question  to  which  there  is  only 
one  correct  answer.  Let  at  "  Xt  asserts  a  in  reply  to  the  ques 
tion,"  and  w(.—"X;  gives  the  correct  answer  to  the  question." 
Then 

rn^aji  =  xl  and  m.jaji  ^=x.,, 

j\  and  T2  being,  in  the  conventional  language  of  tin  problem, 
the  "  credibilities  "  of  the  witnesses.  We  have,  since  the  wit 
nesses  agree  and  since  a  follows  from  mlai  and  ml  follows  from  aa 

™ \jd\aJ1  =  w^Wo/o^gA  =  tn.Ja^i.Ji  : 

(i/aji     n>  l<th  ; 
ajaimih<**\  ;   ////'///,//      I . 
Also, since  the  witnesses  are,  in  the  ordinary  sense,  "independent  " 
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witnesses,  aja^ali  =  a2/ah  and  aja^dh  =  a2/dh  ;  that  is  to  say,  the 
probability  of  X2's  asserting  a  is  independent  of  the  fact  of  Xj's 
having  asserted  a,  given  we  know  that  a  is,  in  fact,  true  or  false, 
as  the  case  may  be. 

The  probability  that,  if  the  witnesses  agree,  their  assertion  is 

true  is  m^aji 

a  a^aJi  =  mJa1a2h  =        —=— 
a2/a1h 

aja^mji .  m^a-Ji  aja^ah  .  x^ 

a2a/a-Ji  +  a^d/a-Ji     aja-^ah  .  xl  +  a^a-^ah  .  ( 1  -  x2) 


If  this  is  to  be  equal  to  2yi         V   we  must  ^ave 

aja^ah  =    x2 
aja-^dh     1  -  x2 

aJa.ah    aJah  ,  <•  ^  •    i  » 

Now        ,    _7  =  -  ._,  by  the  hypothesis  of     independence 
a^fa^dh     <i9/ah 

anjh    d/Ji     fijaji    d/h 
~  dajh    a/h    djaji    a/h 
.r2       d/h 

=  1  --  x\2 '  a/h 

This  then  is  the  assumption  which  has  tacitly  slipped  into  the 
conventional  formula,— that  a/h=d/h  =  ^  It  is  assumed,  that 
is  to  say,  that  any  proposition  taken  at  random  is  as  likely  as 
not  to  be  true,  so  that  any  answer  to  a  given  question  is,  d  priori, 
as  likely  as  not  to  be  correct.  Thus  the  conventional  formula 
ought  to  be  employed  only  in  those  cases  where  the  answer 
which  the  "  independent  '*  witnesses  agree  in  giving  is,  d  priori 
and  apart  from  their  agreement,  as  likely  as  not. 

18.  A  somewhat  similar  confusion  has  led  to  the  controversy 
as  to  whether  and  in  what  manner  the  d  priori  improbability 
of  a  statement  modifies  its  credibility  in  the  mouth  of  a  witness 
whose  degree  of  reliability  is  known.  The  fallacy  of  attaching 
the  same  weight  to  a  testimony  regardless  of  the  character  of 
what  is  asserted,  is  pointed  out,  of  course,  by  Hume  in  the  Essay 
on  Miracles,  and  his  argument,  that  the  great  d  priori  improb 
ability  of  some  assertions  outweighs  the  force  of  testimony 
otherwise  reliable,  depends  on  the  avoidance  of  it.  The  correct 
view  is  also  takon  by  Laplace  in  his  Essai  philosophique  (pp. 
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98-102).  where  he  argues  that  a  witness  is  less  to  be  believed 
when  he  asserts  an  extraordinary  fact,  declaring  the  opposite 
view  (taken  by  Diderot  in  the  article  on  "  Certitude  "  in  the 
Encyclopedic]  to  be  inconceivable  before  "  le  simple  bon  sens." 

The  manner  in  which  the  resultant  probability  is  affected 
depends  upon  the  precise  meaning  we  attach  to  "  degree  of  re 
liability  "  or  "  coefficient  of  credibility."  If  a  witness's  credi 
bility  is  represented  by  x,  do  we  mean  that,  if  a  is  the  true  answer, 
the  probability  of  his  giving  it  is  x,  or  do  we  mean  that  if  he 
answers  a  the  probability  of  a's  being  true  is  x  ?  These  two  things 
are  not  equivalent. 

Let  rtj  stand  for  '"  d  is  asserted  by  the  witness  "  ;  7^  for  our 
evidence  bearing  on  the  witness's  veracity  ;  and  Ji2  for  other 
evidence  bearing  on  the  truth  of  a.  Let  u/hjiz,  i-e.  the  d  priori 
probability  of  a  apart  from  our  knowledge  of  the  fact  that  the 
witness  has  asserted  it,  be  represented  by  p. 

Let   ajalhl=j\   and    r/1/«7?1  =y.,  ;    so   that  x1  =        *  .  x2.      In 

ai/"i 
general   a/hl  *  ajh^     Do  we  mean   by  the  witness's  credibility 

Zj  or  x2  ? 

We  require  a/a^/.^h.-,. 

Let  ajdJ^^r,  i.e.  the  probability,  apart  from  our  special 
knowledge  concerning  a,  that,  if  a  is  false,  the  witness  will  hit  on 
that  particular  falsehood. 

</,/"//,/<.,.<////,//.,  :'-,/> 


for  flj/a/^/ijj  -^jyW//,  and  ajahji*--  -n^ah^  since,  given  certain 
knowledge  concerning  a,  h2  is  irrelevant  to  the  probability  of  a^ 
19.  (Jenerally  speaking,  all  problems,  in  regard  to  the  com 
bination  of  testimonies  or  to  the  combination  of  evidence  derived 
from  testimony  with  evidence  derived  from  other  sources,  may 
be  treated  as  special  instances  of  the  general  problem  of  the 
combination  of  arguments.  Beyond  pointing  out  the  above 
plausible  fallacies,  there  is  little,  to  add.  Mr.  W.  K.  .Johnson, 
however,  has  proposed  a  method  of  defining  credibility,  which 
is  sometimes  valuable,  because  it  regards  the  witness's  credibility 
not  absolute!  v,  but  with  reference  to  a  given  tvj.e  of  question, 
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so  that  it  enables  us  to  measure  the  force  of  the  witness's  testimony 
under  special  circumstances.  If  a  represents  the  fact  of  A's 
testimony  regarding  x,  then  we  may  define  A's  credibility  for  x 
as  a,  where  a  is  given  by  the  equation 

x/ah  =  x/h  +  a^/xjh  .  x/h  ; 

so  that  a^/xfh  .  x/h  measures  the  amount  by  which  A's  assertion 
of  x  increases  its  probability. 

20.  One  of  the  most  ancient  problems  in  probability  is  con 
cerned  with  the  gradual  diminution  of  the  probability  of  a  past 
event,  as  the  length  of  the  tradition  increases  by  which  it  is 
established.  Perhaps  the  most  famous  solution  of  it  is  that 
propounded  by  Craig  in  his  Theologiae  Christianae  Principia 
Mathematica,  published  in  1699.1  He  proves  that  suspicions  of 
any  history  vary  in  the  duplicate  ratio  of  the  times  taken  from 
the  beginning  of  the  history  in  a  manner  which  has  been  described 
as  a  kind  of  parody  of  Newton's  Principia.  "  Craig,"  says 
Todhunter,  "  concluded  that  faith  in  the  Gospel  so  far  as  it 
depended  on  oral  tradition  expired  about  the  year  880,  and  that 
so  far  as  it  depended  on  written  tradition  it  would  expire  in  the 
year  3150.  Peterson  by  adopting  a  different  law  of  diminution 
concluded  that  faith  would  expire  in  1789."  2  About  the  same 
time  Locke  raised  the  matter  in  chap.  xvi.  bk.  iv.  of  the 
Essay  Concerning  Human  Understanding  :  "  Traditional  testi 
monies  the  farther  removed,  the  less  their  proof.  .  .  .  No 
Probability  can  rise  higher  than  its  first  original."  This  is 
evidently  intended  to  combat  the  view  that  the  long  acceptance 
by  the  human  race  of  a  reputed  fact  is  an  additional  argument 

1  See  Todhunter's  History,  p.  54.  It  has  been  suggested  that  the  anonymous 
essay  in  the  Phil.  Trans,  for  1699  entitled  "  A  Calculation  of  the  Credibility 
of  Human  Testimony  "  is  due  to  Craig.  In  this  it  is  argued  that,  if  the 
credibilities  of  a  set  of  witnesses  are  pv  .  .  .  pti,  then  if  they  are 
successive  the  resulting  probability  is  the  product  p^2  .  .  .  pn  ;  if  they  are 
concurrent,  it  is  :  _ 


This  last  result  follows  from  the  supposition  that  the  first  witness  leaves  an 
amount  of  doubt  represented  by  1  -  p,  ;  of  this  the  second  removes  the  fraction 
p,,  and  so  on.  See  also  Lacroix,  Traite  eleinentaire,  p.  262.  The  above  theory 
was  actually  adopted  by  Bicquilley. 

2  In  the  Budget  of  Paradoxes  De  Morgan  quotes  Lee,  the  Cambridge  Orientalist, 
to  the  effect  that  Mahometan  writers,  in  reply  to  the  argument  that  the  Koran 
has  not  the  evidence  derived  from  Christian  miracles,  contend  that,  as  evidence 
of  Christian  miracles  is  daily  weaker,  a  time  must  at  last  arrive  when  it  will 
fail  of  affording  assurance  that  they  were  miracles  at  all  :  whence  the  necessity 
of  another  prophet  and  other  miracles. 
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in  its  favour  and  that  a  long  tradition  increases  rather  than 
diminishes  the  strength  of  an  assertion.  "  This  is  certain."  says 
Locke,  "  that  what  in  one  nge  was  affirmed  upon  slight  grounds, 
can  never  after  come  to  be  more  valid  in  future  ages,  by  being 
often  repeated."  In  this  connection  he  calls  attention  to  "  a 
rule  observed  in  the  law  of  England,  which  is,  that  though  the 
attested  copy  of  a  record  be  good  proof,  yet  the  copy  of  a  copy 
never  so  well  attested,  and  by  never  so  credible  witnesses,  will 
not  be  admitted  as  a  proof  in  Judicature."  If  this  is  still  a  good 
rule  of  law,  it  seems  to  indicate  an  excessive  subservience  to  the 
principle  of  the  decay  of  evidence. 

But,  although  Locke  affirms  sound  maxims,  he  gives  no  theory 
that  can  afford  a  basis  for  calculation.  Craig,  however,  was  the 
more  typical  professor  of  probability,  and  in  attempting  an 
algebraic  formula  he  was  the  first  of  a  considerable  family.  The 
last  grand  discussion  of  the  problem  took  place  in  the  columns 
of  the  Educational  Times.1  Macfarlane 2  mentions  that  four 
different  solutions  have  been  put  forward  by  mathematicians 
of  the  problem  :  "  A  says  that  B  says  that  a  certain  event  took 
place  ;  required  the  probability  that  the  event  did  take  place, 
/?!  and  p.,  being  A's  and  B's  respective  probabilities  of  speaking 
the  truth."  Of  these  solutions  only  Cayley's  is  correct. 

1  Reprinted  in  Mathematics  from  the  Educational  Time*,  vol.  xxvii. 

1  Al'jifira  of  LOIJIC,  p.  151.  Murfarlane  .attempts  a  solution  of  the  general 
problem  without  success.  Its  solution  is  not  difticult,  if  enough  unknowns  urc 
introduced,  hut  of  very  little  interest. 


CHAPTER  XVII 

SOME  PROBLEMS  IN  INVERSE  PROBABILITY,  INCLUDING  AVERAGES 

1.  THE  present  chapter  deals  with  '  problems  ' — that  is  to 
say,  with  applications  to  particular  abstract  questions  of  some  of 
the  fundamental  theorems  demonstrated  in  Chapter  XIV.  It 
is  without  philosophical  interest  and  should  probably  be  omitted 
by  most  readers.  I  introduce  it  here  in  order  to  show  the  ana 
lytical  power  of  the  method  developed  above  and  its  advantage 
in  ease  and  especially  in  accuracy  over  other  methods  which 
have  been  employed.1  §  2  is  mainly  based  upon  some  problems 
discussed  by  Boole.  §§  3-7  deal  with  the  fundamental  theory 
connecting  averages  and  laws  of  error.  §§  8-11  treat  discursively 
the  Arithmetic  Average,  the  Method  of  Least  Squares,  and 
Weighting. 

2.  In  the  following  paragraph  solutions  are  given  of  some 
problems  posed  by  Boole  in  chapter  xx.  of  his  Laivs  of  Thought. 
Boole's  own  method  of  solving  them  is  constantly  erroneous,2 
and  the  difficulty  of  his  method  is  so  great  that  I  do  not  know 
of  any  one  but  himself  who  has  ever  attempted  to  use  it.  The 
term  '  cause  '  is  frequently  used  in  these  examples  where  it  might 
have  been  better  to  use  the  term  '  hypothesis.'  For  by  a  possible 
cause  of  an  event  no  more  is  here  meant  than  an  antecedent 
occurrence,  the  knowledge  of  which  is  relevant  to  our  anticipation 
of  the  event ;  it  does  not  mean  an  antecedent  from  which  the 
event  in  question  must  follow. 

(56)  The  a  priori  probabilities  of  two  causes  Aj  and  A2 
are  c1  and  c2  respectively.  The  probability  that  if  the  cause  Al 

1  Such  examples  as  these  might  sometimes  be  set  to  test  the  wits  of  students. 
The  problems  on  Probability  usually  given  are  simply  problems  on  mathematical 
combinations.     These,  on  the  other  hand,  are  really  problems  in  logic. 

2  For  the  reason  given  in  §  6  of  Chapter  XVI.     The  solutions  of  problems 
J. -VI.,  for  example,  in  the  Laws  of  Thought,  chap,  xx.,  are  all  erroneous. 
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occur,  an  event  E  will  accompany  it  (whether  as  a  consequence 
of  A!  or  not),  is  ;>1?  and  the  probability  that  E  will  accompany  A2, 
if  Ao  present  itself,  is  p2.  Moreover,  the  event  E  cannot  appear 
in  the  absence  of  both  the  causes  Aj  and  A2.  Required  the  prob 
ability  of  the  event  E. 

This  problem  is  of  great  historical  interest  and  has  been  called 
Boole's  '  Challenge  Problem.'  Boole  originally  proposed  it  for 
solution  to  mathematicians  in  1851  in  the  Cambridge  and  Dublin 
Mathematical  Journal.  A  result  was  given  by  Cayley  l  in  the 
Philosophical  Magazine,  which  Boole  declared  to  be  erroneous.2 
He  then  entered  the  field  with  his  own  solution.3  "  Several 
attempts  at  its  solution,"  he  says,  "  have  been  forwarded  to  me, 
all  of  them  by  mathematicians  of  great  eminence,  all  of  them 
admitting  of  particular  verification,  yet  differing  from  each  other 
and  from  the  truth."  4  After  calculations  of  considerable  length 
and  great  difficulty  he  arrives  at  the  conclusion  that  u  is  the 
probability  of  the  event  E  where  u  is  that  root  of  the  equation 

1  -r:(l  -pj  -ul  -c2(}  -/;,)  -  //]      (//  -r^J(«  -r2p2) 


which   is   not  less  than    q/y,   and  c2p2   and    not    greater    than 

This  solution  can  easily  be  seen  to  be  wrong.  For  in  the 
case  when1  Aj  and  A2  cannot  both  occur,  the  solution  is 
u=Ctf)1+c2p2;  whereas  Boole's  equations  do  not  reduce  to 

1  /'/»//.   M<1</.  4th  series,  vol.  vi. 

2  Caylev's. solution  was  defended  again-t  Boole  by  I  )edekind  (< 'rt  lie's  Journal. 
vol.  1.  p.  2<5S).      The  difference  arises  out  of  the  extreme  ambiguity  as  to  the 
meaning  of  the  terms  as  employed  by  Cayley. 

3  "Solution  of  a  Question  in  the  Theory  of   Probabilities,"  Phil.  Mag.  4th 
series,   vol.    vii.,    1S.")4.      This   solution    is   the   same   as   that    printed   by    Boole 
shortly   afterwards   in   the   IMICX  »f  Thought,    pp.   :il!l -.'tlMi.      In   the   Phil.    Mag. 
Wilbraham  gave  as  the  solution    it    -r,/*,     <:,/>.,      :,    where  z   is  necessarily   less 
than  either  r}/>1  or  '',/<,.      Thi>  .-Mluti..  i   is  correct    so  far  as  it   goes,   but   is  not 

is^ed    by    Macfarlane,    Algebra    of  IAXJIC , 

had  said:  "The  motives  which  have 
idopt,  with  reference  to  this  question,  a 
1  not  upon  .slight  grounds  to  be  revived, 
e  question  a^  a  test  of  the  sufficiency  of 
•ipate  that  its  discussion  will  in  some 


complete.     The   problei 
p.   l.">4. 

4  In  proposing  the  problem  Booh 
led  me,  after  much  consideration,  to 
course  unusual  in  the  present  day,  an 
are  the  following  :  First,  I  propose  tl 
received  methods.  Secondlv,  1  ant 


me.i-ure  add  to  our  knowledge  of  an  important  branch  of  pure  aiial\>n>. 
When  printing  his  own  solution  in  the  L«ir«  <>j  Thought,  he  adds,  that  the 
above  "  led  to  some;  interesting  privat*-  correspondence,  l»ut  did  not  elicit  a 
solution.' ' 
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this  simplified  form.  The  mistake  which  Boole  has  made  is 
the  one  general  to  his  system,  referred  to  in  Chapter  XVI.,  §  6.' 

The  correct  solution,  which  is  very  simple,  can  be  reached  as 
follows  : 

Let  alt  a2)  e  assert  the  occurrences  of  the  two  causes  and  the 
event  respectively,  and  let  h  be  the  data  of  the  problem. 

Then  we  have  ajh^c^  a2/h  =  c2,  e/aji^p^  e/a2h=p2:  we 
require  e/h.  Let  e/h=u,  and  let  a^ajeh^z.  Since  the  event 
cannot  occur  in  the  absence  of  both  the  causes, 


It  follows  from  this  that  dld2/eh=0,  unless  e/h^O, 

i-e-  K  +  «2)M  =  i, 

whence  «iM  +  ajgh  =  1  +  <Y'2M  by  (24). 


l+z 

where  z  is  the  probability  after  the  event  that  both  the  causes  were 
present. 

If  we  write  ea^ajh  =  y, 


Boole's  solution  fails  by  attempting  to  be  independent  of 
y  or  -. 

(56.1).  Suppose  that  we  wish  to  find  limits  for  the  solu 
tion    which  are    independent    of  y    and  z:    then,   since  ?/>0, 
ciPi  +  c2p2. 
Again 

1/k^l  -r1  +  C]J?1  bv  (24.2)  and  (4). 


Similarly  c\h  ^l-c2  +  c2p2.     From  the  same  equations  it  appears 
that  e/i^c        and  ^r>. 


•M50A1!''S,err°r  ^  l)ointcd  oufc  and  »  correct  solution  given  in  Mr.  McColl's 
°"          CaICUlUS  °f  E9uivalent  Statements"  (Proc.  Lond.  Math. 


c   vol    xxviii 
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.-.   //  lies  between 

I  c   /i  I  '  ll  l  "''  <2y"- 

the  greatest  of          l  and  the  least  of     1  -<-.(\  - /t.) 

f'o/'o 


It  will  be  seen  that  these  numerical  limits  are  the  same  as  the 
limits  obtained  by  Boole  for  the  roots  of  his  equations. 

(56.2)  Suppose  that  the  d  priori  probabilities  of  the  causes  c, 
and  c2  are  to  be  eliminated.     The  only  limit  we  then  have  is 

(50.3)  Suppose  that  one  of  the  d  priori  probabilities  r2  is  to  be 
eliminated.     \Ve  then  have  limits  c^})^  u<\  -  c,  ,  r^.     If,  there 
fore,  G!  is  large,  u  does  not  diil'er  widely  from  r,/^. 

(50.4)  Suppose  p2  is  to  be  eliminated.     We  then  have 


If  therefore  cl  is  large  or  c2  small,  u  does  not  differ  widely 
from  c1p1. 

(50.5)  If  ajtiji^a^h,  i.e.  if  our  knowledge  of  each  of  the 
causes  is  independent,  we  have  a  closer  approximation.  For 


(57)  We  may  now  generalise  (50)  and  discuss  the  case  of  n 
causes.  Jf  an  event  can  only  happen  as  a  consequence  of  one 
or  more  of  certain  causes  Al5  A.2.  .  .  .  A  .  and  if  c,  is  the  d  ]>riori 
probability  of  the  cause  A1  and  j^  the  probability  that,  if  the 
cause  A!  be  known  to  exist,  the  event  E  will  occur  :  required  the 
probability  of  E. 

This  is  Boole's  problem  VJ.  (Laws  of  ThouyJil,  p.  .'WO).     As 
the  result  of  ten  pages  of  mathematics,  he  finds  the  solution  to  be 
the  root  Iving  between  certain  limits  of   an  Aquation  of  th<i  n! 
degree  which  he,  cannot  solve.      1  know  no  other  discussion  of  the 
problem.     The  solution  is  as  follows  : 

e//i      cdj/i      '-njh-   'djh  +  c/a^i  .  n^'/i  -  ''dljii  +  cl/>J      (i.) 
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/.  e/h  =  ed-^djh  +  c^p-^  +  c2p2  -  ea^ajh, 

and 


In  general 

ea^a  .  .  .  a,  _!/A  =  m^  .  .  .  d^^djh  +  ed^  .  .  .  d^^J 
=  ea,  .  .  .  ar//i  +  eaj_  .  .  .  dr_ljaih  .  c.f 
=  6%  .  .  .  ar/A  +  cr{e/arh  -  ed±  .  .  .  a^j 
=  ed1.  .  .  dr/h  +  crpr  -  m-i  .  .  .  dr_larj'h, 

n  n 

.'.  finally  we  have  e/h  =  ed-^  .  .  .  an/A  +  lie,,;?,.  -  ^ed  .  .  .  n^ 

I  2 

But  since  the  n  causes  are  supposed  to  be  exhaustive 
edj,  .  .  .  dll/Ji=i). 

n  n 

.'.  e/h  =  ^c,.p,.  -  ^^%  .  .  .  a,.  _  !«.,.//* 
i 

Let  ct?!  .  .  .  a,._  ^,,/A  =  ?/v  ; 

7(  /( 

then  tf//z,  =  ^cfpr  -  ^nr 

1  -2 

(57.1)  If  our  knowledge  of  the  several  causes  is  independent, 
if,  that  is  to  say,  our  knowledge  of  the  existence  of  any  one  of 
them  is  not  relevant  to  the  probability  of  the  existence  of  any 
other,  so  that  ar/ash  =  ar/h  =  cr)  then 


=  cr .  e/d1 .  .  .  df.ja-fh^l  -d1.  .  .  dr  ila,h} 

=  41  - 11(1  -cj  .  .  .  (1  -Cr^efa  .  .  .  &,_&}. 

Let  e/d1 . .  .  dr_  ^ith  =  ?«.;i, 

then  i//i=  Scr^r-  2*Jl  -  11  (1  -c.)lw  .. 

p  =  l  r=2  s  =  l 

These  results  do  not  look  very  promising  as  they  stand,  but 
they  lead  to  some  useful  approximations  on  the  elimination  of 
mr  and  nr  and  to  some  interesting  special  cases. 
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(57.2)   From    equation    (i.)    it    follows    that    efh^c-j)l    and 

n 

-Cj(l  -pi)  ;   and  from  equation  (ii.)  that  e/h^'Zcj)  ; 


.-.  e!h  lies  between 


'i/'i 


the  greatest  of  •  and  the  least  of 

V'nPn 


ll -Ul -,'„). 

(57.3)  Further,  if  the  causes  are  independent  it  follows  from 
(57.1)  that 


so  that  e/h  lies  between 
the  greatest  of 


and  the 
least  of 


-<•„(!  -A)- 

(57.4)  Now  consider  the  case  in  which  pl  =  p2  =  .  .  .  =pn  =1, 
i.e.  in  which  any  of  the  causes  would  be  sufficient,  and  in  which 
the  causes  are  independent.  Then  mr  =  I  ;  so  that 


=  1  -(1  -('!)(!  -ra)...(l-fj- 

(57.5)  Let   cl5    c2  ...  cn    be    small    quantities    so   that    their 
squares  and  products  may  be  neglected. 

Theu  e/h=*crpf, 

i.e.  the  smaller  the  probabilities  of  the,  causes  the  more  do  they 
approach  the  condition  of  being  mutually  exclusive.1 

(57.6)  The  a  posteriori  probability   of  a  particular  cause  a, 
after  the  event  has  been  observed  is 


,„, 


(This  is  Boole's  problem  IX.,  p.  357). 

1    Boole  arrives  Jit  this  result,  IMWH  of  Thought,  p.  34.5,  but  I  <luiiht  hi.s  pn>of. 
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(58)  The  probability  of  the  occurrence  of  a  certain  natural 
phenomenon  under  given  circumstances  is  p.  There  is  also  a 
probability  a  of  a  permanent  cause  of  the  phenomenon,  i.e.  of  a 
cause  which  would  always  produce  the  event  under  the  circum 
stances  supposed.  What  is  the  probability  that  the  phenomenon, 
being  observed  n  times,  will  occur  the  n  +  Ith  ? 

This  is  Boole's  problem  X.  (Laws  of  Thought,  p.  358).  Boole 
arrives  by  his  own  method  at  the  same  result  as  that  given  below. 
It  is  necessary  first  of  all  to  state  the  assumption  somewhat 
more  precisely.  If  xr  asserts  the  occurrence  of  the  event  at  the 
rth  trial  and  t  the  existence  of  the  '  permanent  cause  '  we  have 

xjh^p,  i/h  =  a,  xr/th  =  l, 
and  we  require  xn+i/xi  •  •  •  x,fr  =  !/n+i- 

It  is  also  assumed  that  if  there  is  no  permanent  cause  the  prob 
ability  of  xs  is  not  affected  by  the  observations  xr,  etc.,  i.e. 

xjxr .  .  .  xtth  =  xjth,1 
.-      x t/h    xjh  - xj/h    p  -a 
=  ''ih=  "    =l-« 


#]_ .  .  .  xr_1t/h    p  -  a    T!  .  .  .  x^^/th  .  t/h 


i  p  -  (t .  \  1  - 

•  y,-i    i-«    ?/i//2- 
l^~a\r~1 

\l-a) 
l'e'  Vl^         ------ 

ft  +  (^-a)^ 

Also  yi=^and?/2-  — 3 

,7/i 


1  This  assumption,  which  is  tacitly  introduced  by  Boole,  is  not  generally 
justifiable.  I  use  it  here,  as  my  main  purpose  is  to  illustrate  a  method.  The 
same  problem,  without  this  assumption,  will  be  discussed  in  dealin^  with  Pure 
Induction. 
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so  th.it 


(5"\1)  If  />=n,  yn  =1  ;  for  if  an  event  can  only  occur  as  the 
result  of  a  permanent  cause,  a  single  occurrence  makes  future 
occurrences  certain  under  similar  conditions. 

(58.2) 

ii         ft  "*  * )    -  -  //   i 

1     '         \ 


•r-:) 


(by  easy  algebra)  : 
and  //  is  always 


and 


So  that  d>  -  a)i  )   is  positive;  and  decreases  as  r  increases, 

As  n  increases  yn  =  1  -  e,  where 

1 -(;::) 


so  that  for  any  value  of  ?;  however  small  a  value  of  n  can  be 
found  such  that  e<?/  so  long  as  a  is  not  zero. 

(58.3)  tn  the  a  posteriori  probability  of  a  permanent  cause 
after  n  successful  observations  is 


i.e. 


tn  =  1  -  t',  where  e' 
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So  that  t,,  approaches  the  limit  unity  as  n  increases,  so  long  as  a 
is  not  zero. 

3.  The  following  is  a  common  type  of  statistical  problem.1 
We   are   given   a   series   of   measurements,   or   observations,   or 
estimates  of  the  true  value  of  a  given  quantity  ;   and  we  wish  to 
determine  what  function  of  these  measurements  will  yield  us 
the  most  probable  value  of  the  quantity,  on  the  basis  of  this  evid 
ence.     The  problem  is  not  determinate   unless  we  have  some 
good  ground  for  making  an  assumption  as  to  how  likely  we  are 
in  each  case  to  make  errors  of  given  magnitudes.     But  such  an 
assumption,  with  or  without  justification,  is  frequently  made. 

The  functions  of  the  original  measurements  which  we  com 
monly  employ,  in  order  to  yield  us  approximations  to  the  most 
probable  value  of  the  quantity  measured,  are  the  various  kinds 
of  means  or  averages  —  the  arithmetic  mean,  for  example,  or 
the  median.  The  relation,  which  we  assume,  between  errors  of 
different  magnitudes  and  the  probabilities  that  we  have  made 
errors  of  those  magnitudes,  is  called  a  law  of  error.  Corresponding 
to  each  law  of  error  which  we  might  assume,  there  is  some  function 
of  the  measurements  which  represents  the  most  probable  value 
of  the  quantity.  The  object  of  the  following  paragraphs  is  to 
discover  what  laws  of  error,  if  we  assume  them,  correspond  to 
each  of  the  simple  types  of  average,  and  to  discover  this  by  means 
of  a  systematic  method. 

4.  Let  us  assume  that  the  real  value  of  the  quantity  is  either 
&ls  .  .  .  br  .  .  .  bn,  and  let  a,,  represent  the  conclusion  that  the 
value  is,  in  fact,  br.     Further  let  xr  represent  the  evidence  that 
a  measurement  has  been  made  of  magnitude  ?/,.. 

If  a  measurement  ?/;,  has  been  made,  what  is  the  probability 
that  the  real  value  is  6S  ?     The  application  of  the  theorem  of 
inverse  probability  yields  the  following  result  : 
,    7      xjaji.     aj/i 


(the  number  of  possible  values  of  the  quantity  being  n),  where 
h  stands  for  any  other  relevant  evidence  which  we  may  have, 
in  addition  to  the  fact  that  a  measurement  xp  has  been  made. 
Next,  let  us  suppose  that  a  number  of  measurements  y^  .  .  .  ynl 

1  The  substance  of  §§  3-7  has   been  printed  in  the  Journal  of  the  Royal 
Statistical  Society,  vol.  Ixxiv.  p.  323  (February  1011). 
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have  been  made  ;   what  is  now  the  probability  that  the  real  value 
is  6,?     We  require   the  value   of  ,,,,V2  .  .  .  XJ.      As  before, 

ajxfr  .  .  .xmh-5f' 

-./'!   .   .   .  xjaji.     <ir/h 
>•   i 

At  this  point  we  must  introduce  the  simplifying  assumption 
that,  if  we  knew  the  real  value  of  the  quantity,  the  different 
measurements  of  it  would  be  independent,  in  the  sense  that  a 
knowledge  of  what  errors  have  actually  been  made  in  some,  of 
the  measurements  would  not  ail'ect  in  any  way  our  estimate  of 
what  errors  are  likely  to  be  made  in  the  others.     We  assume, 
in  fact,   that  x,./xr  .   .   .  x/i,.h=xi./arh.     This  assumption  is  ex 
ceedingly  important,     It  is  tantamount  to  the  assumption  that 
our  law  of  error  is  unchanged  throughout  the  series  of  observations 
in  question.     The  general  evidence  h,  that  is  to  say,  which  justifies 
our  assumption  of  the  particular  law  of  error  which  we  do  assume, 
is  of  such  a  character  that  a  knowledge  of  the  actual  errors  made 
in  a  number  of  measurements,  not  more  numerous  than  those 
in   question,  are  absolutely  or  approximately  irrelevant  to  the 
question   of  what   form  of  law  we  ought   to  assume.     The   law 
of   error   which   we   assume   will    be    based,  presumably,  on    an 
experience  of  the  relative  frequency  with  which  errors  of  different 
magnitudes  have  been  made  under  analogous  circumstances  in 
the  past.     The   above   assumption   will   not   be  justified   if   the 
additional  experience,  which  a  knowledge  of  the  errors  in  the,  new 
measurements  would  supply,  is  sufficiently  comprehensive,  rela 
tively  to  our  former  experience,  to  be  capable  of  modifying  our 
assumption  as  to  the  shape  of  the  law  of  error,  or  if  it  suggests 
that  the  circumstances,  in  which  the  measurements  are  being 
carried  out,  are  not  so  closely  analogous  as  was  originally  supposed. 

With  this  assumption,  i.e.  that  xlt  etc.,  are  independent  of 
one  another  relatively  to  evidence  a,Ji,  etc.,  it  follows  from  the 
ordinary  rule  for  the  multiplication  of  independent  probabilities 
that 


ajh.     \\rjaji 
Hence  <tzx»  .  -  .  si  = 
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The  most  probable  value  of  the  quantity  under  measurement, 
given  the  m  measurements  ylt  etc.  —  which  is  our  quaesitum  —  is 
therefore  that  value  which  makes  the  above  expression  a  maxi 
mum.  Since  the  denominator  is  the  same  for  all  values  of  60 
we  must  find  the  value  which  makes  the  numerator  a  maximum. 
Let  us  assume  that  a^Ji  =  a.2/h  =  .  .  .  =ajh.  We  assume,  that 
is  to  say,  that  we  have  no  reason  a  priori  (i.e.  before  any  measure 
ments  have  been  made)  for  thinking  any  one  of  the  possible 
values  of  the  quantity  more  likely  than  any  other.  We  require, 

q  =  m 

therefore,  the  value  of  bx,  which  makes  the  expression   Ilxjaji 

'i=i 
a  maximum.     Let  us  denote  this  value  by  y. 

We  can  make  no  further  progress  without  a  further  assump 
tion.  Let  us  assume  that  x,jaji—  namely,  the  probability  of  a 
measurement  yq  assuming  the  real  value  to  be  6,  —  is  an  algebraic 
function  /  of  yn  and  6S,  the  same  function  for  all  values  of  yq  and 
6S.  within  the  limits  of  the  problem.1  We  assume,  that  is  to  say, 
xljaji=f(y,l)b!t))  and  we  have  to  find  the  value  of  6,,  namely  y, 

q=m 

which    makes   Hf(yq,y)    a    maximum.      Equating    to    zero   the 

?-i  _ 
differential  coefficient  of  this  expression  with  respect  to  y,  we 


have  =0,2   where  /'=  This   equation   may   be 

9-1  f(y,,,y)  dy 

-  f 
written  for  brevity  in  the  form  2,-  7=0. 

1 
If  we  solve  this  equation  for  y,  the  result  gives  us  the  value  of 

the  quantity  under  observation,  which  is  most  probable  relatively 
to  the  measurements  we  have  made. 

The  act  of  differentiation  assumes  that  the  possible  values  of  y 
are  so  numerous  and  so  uniformly  distributed  within  the  range 
in  question,  that  we  may,  without  sensible  error,  regard  them  as 
continuous. 

5.  This  completes  the  prolegomena  of  the  inquiry.     We  are 

1  Gauss,  in  obtaining  the  normal  law  of  error,  made,  in  effect,  the  more 
special  assumption  that  x(//aji  is  a  function  of  e(l  only,  where  eq  is  the  error  and 
e<)  —  bn-  yq.     We  shall  find  in  the  sequel  that  all  symmetrical  laws  of  error, 
such  that  positive  and  negative  errors  of  the  same  absolute  magnitude  are 
equally  likely,  satisfy  this  condition  —  the  normal  law,  for  example,  and  the 
simplest  median  law.     But  other  laws,  such  as  those  which  lead  to  the  geometric 
mean,  do  not  satisfy  it. 

2  Since  none  of  the  measurements  actually  made  can  be  impossible,  none  of 
the  expressions  f(ylf,y)  can  vanish. 
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now  in  a  position  to  discover  what  laws  of  error  correspond  to 
given  assumptions  respecting  the  algebraic  relation  between  the 
measurements  and  the  most  probable  value  of  the  quantity,  and 
vice  versa.  For  the  law  of  error  determines  the  form  of  f(y,,,y). 

And  the  form  of/(y(/,y)  determines  the  algebraic  relation  S-  ''=0 

between  the  measurements  and  the  most  probable  value.  It 
may  be  well  to  repeat  that  f(y,ry)  denotes  the  probability  to 
us  that  an  observer  will  make  a  measurement  y,{  in  observing  a 
quantity  whose  true  value  we  know  to  be  y.  A  law  of  error  tells 
us  what  this  probability  is  for  all  possible  values  of  y,  and  y 
within  the  limits  of  the  problem. 

(i.)  If  the  most  probable  value  of  the  quantity  is  equal  to  the 
arithmetic  mean  of  the  measurements,  what  law  of  error  does  this 
imply  ? 

~    ''=0must  be  equivalent  to  ^L(ij  -v/(/)  =0,  since  tin- 

most  probable  value  //  must  equal        2£y,  . 

m,/D'i  ' 

f 

.'.     '/==</>"(y)(//  -//,,)  where  (/>"(//)  is  some  function  which 

is  not  zero  and  is  independent  of  yir 
Integrating, 

f<t>"(y)(y-y,,)<iy  +  y'  (//.,)  ^vlle™  ^(y,)  is  SO»K'  func 

tion  independent  of  y. 


So  that         /•/^f/.vo 

Any  law  of  error  of  this  type,  therefore,  leads  to  the  arithmetic 
mean  of  the  measurements  as  the  most  probable  value  of  the 
quantity  measured. 

If  we  put  (f)(y)  =  -/TV/-  and  >//"(#,,)"=  -  l&y*  +  log  A,  we  obtain 
f,l=Ae~l  /'^',  the  form  normally  assumed. 

=  Ae    *'•'"*,  where  z,t  is  the  absolute  magnitude  of  the  error  in 
the  measurement  y,r 

This  is,  clearly,  only  one  amongst  a  number  of  possible  solu 
tions.  But  with  one  additional  assumption  we.  can  prove  that 
this  is  the  only  law  of  error  which  leads  to  the  arithmetic  mean. 
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Let  us  assume  that  negative  and  positive  errors  of  the  same 
absolute  amount  are  equally  likely. 

In  this  case/;  must  be  of  the  form  EeG(!/~!/ll)\ 


-  y,,  - 

Differentiating  with  respect  to  y, 


But  <t>"(y)  is,  by  hypothesis,  independent  of  y  . 

'''    fJ(    -     \f^y  ~  V'F  =  ~k2  where  k  is  constant  ;   integrating, 

%  ~  yq)2  =  -  &(y  -y,.)-  +  log  C  and  we  have  /  =  Ae  ~  k*(>>  ~  •'"<?  (where 
A=BC). 

(ii.)  What  is  the  law  of  error,  if  the  geometric  mean  of  the 
measurements  leads  to  the  most  probable  value  of  the  quantity  ? 

In  this  case  %JJL  =  Q  must  be  equivalent  to    ILy,  =  ym,  i.e.  to 

fq  '/<=] 


y 

Proceeding  as  before,  we  find  that  the  law  of  error  is 


There  is  no  solution  of  this  which  satisfies  the  condition  that 
negative  and  positive  errors  of  the  same  absolute  magnitude  are 
equally  likely.  For  we  must  have 

}  dy  +  f  (//„)  =  j(y  -  y,y)8 

J  u 

or  cf)"(y)  log  -/(/  =  (   <l>(y-y,y, 
y    (!>/ 

which  is  impossible. 

The  simplest  law  of  error,  which  leads  to  the  geometric  mean, 
seems  to  be  obtained  by  putting  <f)'(y)=  -ky,  ^(;yv)=0.     This 

gives  /,=A(> 

\Uq 

A  law  of  error,  which  leads  to  the  geometric  mean  of  the 
observations  as  the  most  probable  value  of  the  quantity,  has  been 
previously  discussed  by  Sir  Donald  McAlister  (Proceedings  of  the 
Royal  Society,  vol.  xxix.  (1879)  p.  365).     His  investigation  de 
pends  upon  the  obvious  fact  that,  if  the  geometric  mean  of  the 
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observations  yields  the  most  probable  value  of  the  quantity,  the 
arithmetic  mean  of  the  logarithms  of  the  observations  must  yield 
the  most  probable  value  of  the  logarithm  of  the  quantity.  Hence, 
if  we  suppose  that  the  logarithms  of  the  observations  obey  the 
normal  law  of  error  (which  leads  to  their  arithmetic  mean  as  the 
most  probable  value  of  the  logarithms  of  the  quantity),  we  can 
by  substitution  find  a  law  of  error  for  the  observations  themselves 
which  must  lead  to  the  geometric  mean  of  them  as  the  most 
probable  value  of  the  quantity  itself. 

If,  as  before,  tlie  observations  are  denoted  by  y,^  etc.,  and  the 
quantity  by  y,  let  their  logarithms  be  denoted  by  /(/,  etc.,  and  bv 
I.  Then,  if  l:i,  etc.,  obey  the  normal  law  of  error,/(/v,/)  =Ac~  ';)'. 
Hence  the  law  of  error  for  ytl,  etc.,  is  determined  by 


and  the  most  probable  value  of  y  must,  clearly,  be  the  geometric 
mean  of  y,r  etc. 

This  is  the  law  of  error  which  was  arrived  at  by  Sir  Donald 
McAlister.  It  can  easily  be  shown  that  it  is  a  special  case  of  the 
generalised  form  which  1  have  given  above  of  all  laws  of  error 
leading  to  the  geometric  mean.  For  if  we  put  ^r(>j.)  =  -  A:-(log  ij,f, 
and  (f>'(y)  =2/:2  log  //,  we  have 

/'    _  A  ,/JA  '  log  .</  log  •"•'  +/-JA  '  log  V  n,  - 
J'l  ~  A'  .'/      ^  V 

=  \.e~L  '  log  ''  lcni  '  '  ~  "'''  '  *log  v)>  4 


A  similar  result  has  been  obtained  by  Professor  J.  ('.  Kapteyn.1 
But  he  is  investigating  frequency  curves,  not  laws  of  error,  and 
this  result  is  merely  incidental  to  his  main  discussion.  His 
method,  however,  is  not  unlike  a  more  generalised  form  of  Sir 
Donald  McAlister's.  in  order  to  discover  the,  frequency  rum- 
of  certain  quantities  y,  he  supposes  that  there  art;  certain  other 
quantities  z,  functions  of  the  quantities  y,  which  are  given  by 
z  =  W(y),  and  that  the  frequency  curve  of  these  quantities  z  is 
normal.  By  this  device  he  is  enabled  in  the,  investigation  of  a 
type  of  skew  frequency  curve,  which  is  likely  to  be  met  with 
often,  to  utilise  certain  statistical  constants  corresponding  to 

1  ,VA.vif  Fr»iut  nry  t'urrtx,  p.  22,  pnhlisln-d  l>y  the  Astronoinir.il  laboratory 
at  Groninyen  (l'.M>:{). 
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those  wliich  have  been  already  calculated  for  the  normal 
curve. 

In  fact  the  main  advantage  both  of  Sir  Donald  McAlister's 
law  of  error  and  of  Professor  Kapteyn's  frequency  curves  lies  in 
the  possibility  of  adapting  without  much  trouble  to  unsymmetrical 
phenomena  numerous  expressions  which  have  been  already 
calculated  for  the  normal  law  of  error  and  the  normal  curve  of 
frequency.1 

This  method  of  proceeding  from  arithmetic  to  geometric  laws 
of  error  is  clearly  capable  of  generalisation.  We  have  dealt  with 
the  geometric  law  which  can  be  derived  from  the  normal  arith 
metic  law.  Similarly  if  we  start  from  the  simplest  geometric 


law  of  error,  namely,  /,=A  (  —  )      e~h'ty,  we  can  easily  find,  by 

w'y  ' 

writing  logy^l  and  logyq  =  lq,  the  corresponding  arithmetic 
law,  namely,  fq^A.ek>d(l  l^~ktel,  which  is  obtained  from  the 
generalised  arithmetic  law  by  putting  (f>(l)=k2el  and  i^(Z,)=0. 
And,  in  general,  corresponding  to  the  arithmetic  law 

f  =  Ae^!My  ~  ! 
we  have  the  geometric  law 

/  =  Ae^l(r)  log  ~~ 
where 

y  =  log  2»  y,,  =  log  *«,  I  -^  dz  =  (/>(log  2)  arid  f  ^z,  .)  -  ^(log  2(/). 


(iii.)  What  law  of  error  does  the  harmonic  mean  imply  ? 

1 

- 

q  y 


*  1      1  \ 

In  this  case,  2^=0  must  be  equivalent  to  S(  ----  -  )  =0. 


Proceeding  as  before,  we  find  that^  = 
A  simple  form  of  this  is  obtained  by  putting  <£'(v/)  -  -k2y2  and 

fW=-^.  Then  /y=A6,->-^=A6-^  With  this  law, 
positive  and  negative  errors  of  the  same  absolute  magnitude  are 
not  equally  likely. 

(iv.)  If  the  most  probable  value  of  the  quantity  is  equal  to  the 
median  of  the  measurements,  what  is  the  law  of  error  ? 

The  median  is  usually  defined  as  the  measurement  which 

1  It  may  be  added  that  Professor  Kapteyn's  monograph  brings  forward 
considerations  which  would  be  extremely  valuable  in  determining  the  types  of 
phenomena  to  which  geometric  laws  of  error  arc  likely  to  be  applicable. 
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occupies  the  middle  position  when  the  measurements  are  ranged 
in  order  of  magnitude.  If  the  number  of  measurements  //<.  is  odd, 

the  most  probable  value  of  the  quantity  is  the  ?"   -    th,  and,  if  the 

number  is  even,  all  values  between  the  mth  and  the  (  ™  +  1  jth  are 

2  '  2 

equally  probable  amongst  themselves  and  more  probable  than 
any  other.  For  the  present  purpose,  however,  it  is  necessary  to 
make  use  of  another  property  of  the  median,  which  was  known 
to  Fechner  (who  first  introduced  the  median  into  use)  but  which 
seldom  receives  as  much  attention  as  it  deserves.  //  //  is  the 
median  of  a  number  of  magnitudes,  the  sum  of  the  absolute  differences 
(i.e.  the  difference  always  reckoned  positive]  between  y  and  each  of 
the  may  nit  tides  is  a  minimum.  The  median  y  of  //,  ?/,  .  .  .  //„,  is 

found,  that  is  to  say,  by  making   ^\yi{-y'  a  minimum   where 

i 

'y.f-y  is  the  difference  always  reckoned  positive  between  ?/, 
and  y. 

We  can  now  return  to  the  investigation  of  the  law  of  error 
corresponding  to  the  median. 


Write    y-y,,   =  z,r     Then  since  ^  is  to  be  a 

i 

must  have  1$     ^''=0.     Whence,  proceeding  as  before,  we  have 
The  simplest  case  of  this  is  obtained  by  putting 


whence  ft  =-Ac>  ~ 

This  satisfies  the  additional  condition  that  positive  and  nega 
tive  errors  of  equal  magnitude  are  equally  likely.  Thus  in  this 
important  respect  the  median  is  as  satisfactory  as  the  arithmetic 
mean,  and  the  law  of  error  which  leads  to  it  is  as  simple.  It  also 
resembles  the  normal  law  in  that  it  is  a  function  of  the  error  only. 
and  not  of  the  magnitude  of  the  measurement  as  well. 

The  median  law  of  error,  /,  =  \e~'' :\  where  :.f  is  the  absolute 
amount  of  the  error  always  reckoned  positive,  is  of  some  historical 


202  A  TKEATISE  ON  PROBABILITY  PT.  n 

interest,  because  it  was  the  earliest  law  of  error  to  be  formulated. 
The  first  attempt  to  bring  the  doctrine  of  averages  into  definite 
relation  with  the  theory  of  probability  and  with  laws  of  error  was 
published  by  Laplace  in  1774  in  a  memoir  "  sur  la  probabilite  des 
causes  par  les  evenemens."  *  This  memoir  was  not  subsequently 
incorporated  in  his  Theorie  analylique,  and  does  not  represent  his 
more  mature  view.  In  the  Theorie  he  drops  altogether  the  law 
tentatively  adopted  in  the  memoir,  and  lays  down  the  main  lines 
of  investigation  for  the  next  hundred  years  by  the  introduction 
of  the  normal  law  of  error.  The  popularity  of  the  normal  law, 
with  the  arithmetic  mean  and  the  method  of  least  squares  as  its 
corollaries,  has  been  very  largely  due  to  its  overwhelming  ad 
vantages,  in  comparison  with  all  other  laws  of  error,  for  the  pur 
poses  of  mathematical  development  and  manipulation.  And  in 
addition  to  these  technical  advantages,  it  is  probably  applicable 
as  a  first  approximation  to  a  larger  and  more  manageable  group 
of  phenomena  than  any  other  single  law.  So  powerful  a  hold 
indeed  did  the  normal  law  obtain  on  the  minds  of  statisticians, 
that  until  quite  recent  times  only  a  few  pioneers  have  seriously 
considered  the  possibility  of  preferring  in  certain  circumstances 
other  means  to  the  arithmetic  and  other  laws  of  error  to  the 
normal.  Laplace's  earlier  memoir  fell,  therefore,  out  of  remem 
brance.  But  it  remains  interesting,  if  only  for  the  fact  that  a 
law  of  error  there  makes  its  appearance  for  the  first  time. 

Laplace  sets  himself  the  problem  in  a  somewhat  simplified 
form  :  "  Determiner  le  milieu  que  Ton  doit  prendre  entre  trois 
observations  donnees  d'un  menie  phenomene.''  He  begins  by 
assuming  a  law  of  error  z  =  <j>(y),  where  z  is  the  probability  of  an 
error  y  ;  and  finally,  by  means  of  a  number  of  somewhat  arbitrary 

/yw 

assumptions,  arrives  at  the  result  <f>(z)  =-- -e~my.     If  this  formula 

is  to  follow  from  his  arguments,  y  must  denote  the  absolute  error, 
always  taken  positive.  It  is  not  unlikely  that  Laplace  was  led 
to  this  result  by  considerations  other  than  those  by  which  he 
attempts  to  justify  it. 

Laplace,  however,  did  not  notice  that  his  law  of  error  led  to 

the  median.     For,  instead  of  finding  the  most  probable  value, 

which  would  have  led  him  straight  to  it,  he  seeks  the  "  mean  of 

error  " — the  value,  that  is  to  say,  which  the  true  value  is  as  likely 

1  Memoir  es  presenter  d  I1  Academic  dfs  Sciences,  vol.  vi. 
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to  fall  short  of  as  to  exceed.  This  value  is,  for  the  median  law, 
laborious  to  find  and  awkward  in  the  result.  Laplace  works  it 
out  correct] v  for  the  case  where  the  observations  are  no  more 
than  three. 

6.  1  do  not  think  that  it  is  possible  to  find  by  this  method  a 
law  of  error  which  leads  to  the  mode.  But  the  following  general 
formulae  are  easily  obtained  : 

(v.)  If  -0(#7.y)  =0  is  the  law  of  relation  between  the  measure 
ments  and  the  most  probable  value  of  the  quantity,  then  the  law 
of  error  /,(//,,#)  is  given  by  fl{  =  AeJ  e(!J<lJW'M'l!<+^.  Since/,  lies 
between  0  and  1, /0(;yf/y)0"(y)rfy  +  -^(//v)  +log  A  must  be  negative 
for  all  values  of  ?/(/  and  ij  that  are  physically  possible  :  and,  since 
the  values  of  ?/7  are  between  them  exhaustive, 
"?  \pj  odwWdftiy+tdi.,)  _  -I 

—  .!.&  -    1  , 

where  the  summation  is  for  all  terms  that  can  be  formed  by  giving 
yn  every  value  a  priori  possible. 

(vi.)  The  most  general  form  of  the  law  of  error,  when  it  is 
assumed  that  positive  and  negative  errors  of  the  same  magnitude 
are  equally  probable,  is  Ae~/1/(?/~-//)l,  where  the  most  probable 
value  of  the  quantity  is  given  by  the  equation 

-(.'/  - .'/,)/'(.'/     /A,)'  =  °>  where  /'(//  -  >/,,)"  =        d  ^    ,/(//  -  f/.,)2. 

The  arithmetic  mean  is  a  special  case  of  this  obtained  by  putting 
f(y~y,i)2  (//~//,)~;  an(l  the  median  is  a  special  case  obtained 
by  putting  f(y  -  ,j,f  =  +  V  (//  -  y,f. 

We  can  obtain  other  special  cases  by  putting 


when  tin-  law  of  error  is  At    A'i<-"~-IA'>*  and  the  most  probable  value: 
are  tin-  roots  of  //>//'  --)*//•'— y,;  i  'ty— //  f/  -  -//,/'-(>  :   and  by  put  tin 

f(y —y,)~  =log  (y  -  y).    when    the   law  of   error    is  .,  ,  and 

the  most  probable  values  the  roots  of  ^£  -  0.      In  all  these 

y-y, 

cases  the  law  is  a  function  of  the  error  only. 

7.   These     results     may     be     summarised     thus.       V\  e     have 
assumed  : 

(a)  That  we  ha\e  no  reason,  before  making  measurements,  for 
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supposing  that  the  quantity  we  measure  is  more  likely  to  have 
any  one  of  its  possible  values  than  any  other. 

(b)  That   the   errors   are  independent,   in   the   sense   that  a 
knowledge  of  how  great  an  error  has  been  made  in  one  case  does 
not  affect  our  expectation  of  the  probable  magnitude  of  the  error 
in  the  next. 

(c)  That  the  probability  of  a  measurement  of  given  magnitude, 
when  in  addition  to  the  a  priori  evidence  the  real  value  of  the 
quantity  is  supposed  known,  is  an  algebraic  function  of  this 
given  magnitude  of  the  measurement  and  of  the  real  value  of  the 
quantity. 

(d)  That  we  may  regard  the  series  of  possible  values  as  con 
tinuous,  without  sensible  error. 

(e)  That  the  d  priori  evidence  permits  us  to  assume  a  law  of 
error  of  the  type  specified  in  (c)  ;  i.e.  that  the  algebraic  function 
referred  to  in  (c)  is  known  to  us  d  priori. 

Subject  to  these  assumptions,  we  have  reached  the  following 
conclusions  : 

(1)  The  most  general  form  of  the  law  of  error  is 


leading  to  the  equation  S  0(yny)  =  0,  connecting  the  most  probable 
value  and  the  actual  measurements,  where  y  is  the  most  probable 
value  and  ytl,  etc.,  the  measurements. 

(2)  Assuming  that  positive  and  negative  errors  of  the  same 
absolute  magnitude  are  equally  likely,  the  most  general  form  is 
f^Ae-™'-*"'?,  leading  to  the  equation  2(y  -y,tf'(y  -y,f  =  0, 


where  f'z  =     fz.     Of  the  special  cases  to  which  this  form  gives 

rise,  the  most  interesting  were 

(3)/4=Ae~*^~»9)l=Ae~*v,  where  zq=\y-yq\,  leading  to 
the  arithmetic  mean  of  the  measurements  as  the  most  probable 
value  of  the  quantity  ;  and 

(4)  fq=A.e   k*Zq,  leading  to  the  median. 

(5)  The  most  general  form  leading  to  the  arithmetic  mean  is 
fq=Ae^(!/)(!l-"")-'M+^y"\  with  the  special  cases  (3),  and 

(Q)fq=A.ektey<y-^-ktey. 

(7)  The  most  general  form  leading  to  the  geometric  mean  is 
f  AJffi'M***+/+'ydv+<Krt9  vvith  the  special  cases  : 
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(8)/,=A(^)AV''-''',  and 

(9)//=Ac-i>(IOB7)2' 

(10)  Tlie  most  general  form  leading  to  the  harmonic  mean  is 

X-A^^-J]-/*^-*-*^),  with  the  special  case 

..(V-J/V)1  -/.'V 

(HJ/^Ae    '  -Ac      ••"• 

(12)  The  most  general  form  leading  to  the  median  is 


with  the  special  case  (4). 

In  each  of  these  expressions,  /y  is  the  probability  of  a  measure 
ment  //,/?  given  that  the  true  value  is  //. 

8.  The   doctrine   of   Means   and   the   allied   theory   of   Least 
Squares  comprise  so  extensive  a  subject-matter  that  they  cannot 
be  adequately  treated  except  in  a  volume  primarily  devoted  to 
them.     As,   however,  they  are  one  of  the  important  practical 
applications  of  the  theory  of  probability,  1  am  unwilling  to  pass 
them  by  entirely  ;    and  the  following  discursive  observations, 
chiefly  relating  to  the  Normal  Law  of  Error,  will  serve,  taken  in 
conjunction    with    the    paragraphs    immediately    preceding,    to 
illustrate  the  connection    between  the  theories  of   this  treatise 
and  the  general  treatment  of  averages. 

9.  The  Claims  of  the  Arithmetic  Average.  —  By  definition  the 
arithmetic  average  of  a  number  of  quantities  is  nothing  more 
than  their  arithmetic  sum  divided  bv  their  number.     Hut  the 
utility  of  an  average  generally  consists  in  our  supposed  right  to 
substitute,  in  certain  cases,  this  single  measure  for  the  varying 
measures  of  which  it  is  a  function.     Sometimes  this  requires  no 
justification  ;    the  word  "  average  "   is  in  these  cases  used  for 
the  sake  of  shortness,  and  merely  to  summarise  a  sc;t  of  facts  : 
as,  for  instance,  when  we  say  that  the  birth-rate  in  England  i.s 
greater  than  the  birth-rate  in  France. 

But  there  are  other  cases  in  which  the  average  makes  a  more 
substantial  claim  to  add  to  our  knowledge.  After  a  number  of 
examiners  of  equal  capacity  have  given  varying  marks  to  a 
candidate  for  the  same  paper,  it  may  be  thought  fair  to  allow 
the  candidate  the  average  of  the  different  marks  allotted  :  and 
in  general  if  several  estimates  of  a  magnitude  have  been  made, 
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between  the  accuracy  of  which  we  have  no  reason  to  discriminate, 
we  often  think  it  reasonable  to  act  as  if  the  true  magnitude  were 
the  average  of  the  several  measurements.  Perhaps  De  Witt,  in 
his  report  on  Annuities  to  the  States  General  in  167 1,1  was  the 
first  to  use  it  scientifically.  But  as  Leibniz  points  out :  "  Our 
peasants  have  made  use  of  it  for  a  long  time  according  to  their 
natural  mathematics.  For  example,  when  some  inheritance  or 
land  is  to  be  sold,  they  form  three  bodies  of  appraisers  ;  these 
bodies  are  called  Schurzen  in  Low  Saxon,  and  each  body  makes 
an  estimate  of  the  property  in  question.  Suppose,  then,  that 
the  first  estimates  its  value  to  be  1000  crowns,  the  second,  1400, 
the  third,  1500  ;  the  sum  of  these  three  estimates  is  taken,  viz. 
3900,  and  because  they  were  three  bodies,  the  third,  i.e.  1300,  is 
taken  as  the  mean  value  asked  for.  This  is  the  axiom  :  aequali- 
bus  aequcdia,  equal  suppositions  must  have  equal  consideration."  2 

But  this  is  a  very  inadequate  axiom.  Equal  suppositions 
would  have  equal  consideration,  if  the  three  estimates  had  been 
multiplied  together  instead  of  being  added.  The  truth  is  that 
at  all  times  the  arithmetic  mean  has  had  simplicity  to  recommend 
it.  It  is  always  easier  to  add  than  to  multiply.  But  simplicity 
is  a  dangerous  criterion  :  "  La  nature,"  says  Fresnel,  "  ne  s'est 
pas  embarassee  des  difficultes  d'analyse,  elle  n'a  evite  que  la 
complication  des  moyens." 

With  Laplace  and  Gauss  there  began  a  series  of  attempts  to 
prove  the  worth  of  the  arithmetic  mean.  It  was  discovered  that 
its  use  involved  the  assumption  of  a  particular  type  of  law  of 
error  for  the  a  priori  probabilities  of  given  errors.  It  was  also 
found  that  the  assumption  of  this  law  led  on  to  a  more  com 
plicated  rule,  known  as  the  Method  of  Least  Squares,  for  com 
bining  the  results  of  observations  which  contain  more  than  one 
doubtful  quantity.  In  spite  of  a  popular  belief  that,  whilst  the 
Arithmetic  Mean  is  intuitively  obvious,  the  Method  of  Least 
Squares  depends  upon  doubtful  and  arbitrary  assumptions,  it 
can  be  demonstrated  that  the  two  stand  and  fall  together.3 

1  De  vardye  van  de  hf-renten  na  proportie  van  de  losrenten.     The  Hague,  1671. 

2  Nouveaux  Essais.     Engl.  transl.  p.  540. 

3  Venn  (Logic  of  Chance,  p.  40)  thinks  that  the  Normal  Law  of  Error  and 
the  Method  of  Least  Squares  "  are  not  only  totally  distinct  things,  but  they  have 
scarcely  even  any  necessary  connection  with  each  other.     The  Law  of  Error 
is  the  statement  of  a  physical  fact.  .  .  .  The  Method  of  Least  Squares,  on  the 
other  hand,  is  not  a  law  at  all  in  the  scientific  sense  of  the  term.     It  is  simply 
a  rule  or  direction.  .  .  ." 
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The  analytical  theorems  of  Laplace  and  Gauss  are  complicated, 
but  the  special  assumptions  upon  which  they  are  based  are  easily 
stated.1  Gauss  supposes  (a)  that  the  probability  of  a  given  error 
is  a  function  of  the  error  only  and  not  also  of  the  magnitude  of 
the  observation.  (h)  that  the  errors  are  so  small  that  their  cubes 
and  higher  powers  may  be  neglected.  Assumption  (a)  is  arbi 
trary,2  and  (lauss  did  not  state  it  explicitly.  These  two  assump 
tions,  together  with  certain  others,  lead  us  to  the  result.  For 
let  <£(:)  be  the  law  of  error  where  z  is  the  error,  and  let  us  assume, 
as  it  always  is  assumed  in  these  proofs,  that  <f)(z]  can  be  expanded 

by   Maclauriii's   Theorem.      Then   6(^)^6(0)  •  c6'((>)  •  "  <b"(0]  -, 

2 1/ 

:3 

0"'(°)  r-   .  .  .      It   is  also  supposed  that  positive  and  negative 

errors  are  equally  probable,  i.e.  <p(z)  =(£(  -z),  so  that  (//(O)  and 
</>'"(0)  vanish.  Since  we  may  neglect  24  in  comparison  with  z2. 
<f)(z)  =  <t>(Q) +%z2<f>"(Q).  But  (neglecting  z1  and  higher  powers) 

a  '•  hz2  —  fie  "•  .  so  that  (fr(z)  ---a?  "  . 

Gauss's  proof  looks  much  more  complicated  than  this,  but  he 

'/:-' 

obtains  the  form  ae  •»  by  neglecting  higher  powers  of  z,  so  that 
this  expression  is  really  equivalent  to  a  •  hz2.  Bv  this  approxi 
mation  he  has  reduced  all  the  possible  laws  to  an  equivalent 
form.3  It  is  true,  therefore,  that  the  normal  law  of  error  is,  to 
the  second  power  of  the  error,  equivalent  to  any  law  of  error, 
which  is  a  fund  ion  of  the  error  only.  <m<l  for  which  -posit  irr  nnd 
neyfitire  errors  (ire  equally  probable.  Laplace  also  introduces 
assumptions  equivalent  to  these. 

While  mathematicians  have;  endeavoured  to  establish  the 
normal  law  of  error  and  the  arithmetic,  mean  as  a  law  of  logic, 

1  I'or  an  account  of  the  three  principal  methods  of  arriving  at  the  Method 
of  IxMist  Squares  and  the  Arithmetic  Mean,  see   Ellis,  Least  »SV/«a;v.i.      (Jauss's 
first  method  is  in  the  Tluoria  Molu*,  and  his  second  in  the  Throria  Comhinn- 
lionia    Ob.ierru  tin  mini.      Laplace's    investigations  are  in  chap.   iv.  of  the  second 
Book  of  the   The.' trie  annlytifjur.      Laplace's  method  was  improved    by    I'oisson 
in  the  Connainaance.  d<#  tfinpx  for  1H27  and  1832. 

2  It  does  not  follow,  as  <!.  Hagen  argues  (fi'mmlztigr  dvr  Wahracheinlichkeit*- 
rec.htutn'j,  p.   '20),  that,  because  a  larger  error  is  less   probable  than  a  smaller, 
thrrcforc    the    probability    of    a    given    error    is    a    function    of    its    magnitude 
only. 

:1  This  is  pointed  out  by  Bertram!.  Cnlfnl  <lc*  probabiliti*,  p  -'57. 
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others  have  claimed  for  it  the  testimony  of  experience  and  have 
deemed  it  a  law  of  nature.1 

That  this  cannot  be  so,  is  evident.  For  suppose  that  x^x2 .  .  .  xn 
are  a  set  of  observations  of  an  unknown  quantity  x.  Then,  by 

this  principle,  x  =    2xr  gives  the  most  probable  value  of  x.     But 

*YL 

suppose  we  had  wished  to  determine  x2,  our  observations,  assum 
ing  that  we  can  multiply  correctly,  would  be  a1,2,  .r.,2  .  .  .  x^. 

and  the  most  probable  value  of  x2  =  — 2)icr2.    But  (-S^r)J=f=  ^xr2. 

n  n  n 

And  in  general,  ~^f(xr)  =t=/(   ^xr).      Nor  is  this  a  consideration 

n  n 

which  can  safely  be  ignored  in  practice.  For  our  "observations" 
are  often  the  result  of  some  manipulation,  and  the  particular 
shape  in  which  we  get  them  is  not  necessarily  fixed  for  us.  It  is 
not  easy  to  say  what  the  direct  observation  is.  In  particular  if 
any  such  law  of  sensation,  as  that  enunciated  by  Fechner,  is  true 
(i.e.  that  sensation  varies  as  the  logarithm  of  the  stimulus),  the 
arithmetic  mean  must  break  down  as  a  practical  rule  in  all  cases 
where  human  sensation  is  part  of  the  instrument  by  means  of 
which  the  observations  are  recorded.2 

Apart,  however,  from  theoretical  refutations,  statisticians  now 
recognise  that  the  arithmetic  mean  and  the  normal  law  of  error 
can  only  be  applied  to  certain  special  classes  of  phenomena. 
Quetelet  3  was,  I  think,  the  first  to  point  this  out.  In  England, 
Galton  drew  attention  to  the  fact  many  years  ago,  and  Professor 
Pearson  4  has  shown  "  that  the  Gaussian-Laplace  normal  dis 
tribution  is  very  far  from  being  a  general  law  of  frequency 
distribution  either  for  errors  of  observation  or  for  the  distribution 
of  deviations  from  type  such  as  occur  in  organic  populations.  .  .  . 
It  is  not  even  approximately  correct,  for  example,  in  the  distribu 
tion  of  barometric  variations,  of  grades  of  fertility  and  incidence 
of  disease." 

1  This  is,  of  course,  a  very  common  point  of  view  indeed.     Cf.  Bertrand, 
op.  cit.  p.  183 :   "  Malgre  les  objections  precedentes,  la  formule  de  Gauss  doit 
etre  adoptee.    L'observation  la  confirme :  cela  doit  suffire  dans  les  applications." 

2  This  was  noticed  by  Galton. 

:i  E.g.  Letters  on  the  Theory  of  Probabilities,  p.  114. 

4  On  "  Errors  of  Judgment,  etc.,"  Phil.  Trans.  A,  vol.  cxcviii.  pp.  235-299. 
The  following  quotation  is  from  his  memoir  On  the  General  Theory  of  Skew 
Correlation  and  Nonlinear  Regression,  where  further  references  are  given. 
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The  Arithmetic  Mean  occupies,  therefore,  no  unique  position  ; 
and  it  is  worth  while,  from  the  point  of  view  of  probability,  to 
discuss  the  properties  of  other  possible  means  and  laws  of  error, 
as,  for  example,  on  the  lines  indicated  in  the  earlier  part  of  this 
chapter. 

10.  The  Method  of  Least  Squares. — The  problem,  to  which  this 
method  is  applied,  is  no  more  than  the  application  of  the  same 
considerations,  as  those  which  we  have  just  been  discussing,  to 
cases  where  the  relation  between  the  observed  measurements  and 
the  quantity  whose  most  probable  value  we  require,  involves 
more  than  one  unknown. 

Owing  to  the  surprising  character  of  its  conclusions,  if  they 
could  be  accepted  as  universally  valid,  and  to  the  obscurity  of 
the  mathematical  fabric  that  has  been  reared  on  and  about  it, 
this  method  has  been  surrounded  by  an  unnecessary  air  ot 
mystery.  It  is  true  that  in  recent  times  scepticism  has  grown 
at  the  expense  of  mystery.  It  is  also  true  that  just  views  have 
been  held  by  individuals  for  sixty  years  past,  notably  by  Leslie 
Ellis.  But  the  old  mistakes  are  not  always  corrected  in  the 
current  text-books,  and  even  so  useful  and  generally  used  a 
treatise  on  Least  Squares,  as  Professor  Mansfield  Merriman's. 
opens  with  a  series  of  very  fallacious  statements. 

The  controversial  side  of  the  Method  of  Least  Squares  is 
purely  logical  ;  in  the  later  developments  there  is  much  elaborate 
mathematics  of  whose;  correctness  no  one  is  in  doubt.  What  it 
is  important  to  state  with  the  utmost  possible  clearness  is  the 
precise  assumptions  on  which  the  mathematics  is  based  ;  when 
these  assumptions  have  been  set  forth,  it  remains  to  determine 
their  applicability  in  particular  cases. 

In  dealing  with  averages  we  supposed  ourselves  to  be  pre 
sented  with  a  number  of  direct-  observations  of  some  quantity 
which  it  is  desired  to  determine.  But  it  is  obvious  that  direct 
observations  will  be  in  many  cases  either  impracticable  or  in 
convenient  ;  and  our  natural  course-  will  be  to  measure  certain 
other  quantities  which  we  know  to  bear  fixed  and  invariable 
relations  to  the  unknowns  we  wish  to  determine.  In  surveying, 
for  instance,  or  in  astronomy,  we  constantly  prefer  to  take 
measurements  of  angles  or  distances  in  which  we  are  not  interested 
for  their  own  sakes,  but  which  bear  known  geometrical  relation 
ships  to  the  set  of  ultimate  unknowns. 

p 
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If  we  wish  to  determine  the  most  probable  values  of  a  set  of 
unknowns  xv  xz,  x3  .  .  .  xr,  instead  of  obtaining  a  number  of 
sets  of  direct  observations  of  each,  we  may  obtain  a  number  of 
equations  of  observation  of  the  following  type  : 

.  .  .    +  arxt.  =  V15 

.  .  .    +  brxr  =  V2, 

1*^1  ~^     2^2  "f"     •    •    •      "r"  KfXj.  ~   *  n, 

where  V1?  etc.,  are  the  quantities  directly  observed,  and  the  a's, 
b's,  etc.,  are  supposed  known  (?&>?•). 

We  have  in  such  a  case  n  equations  to  determine  r  unknowns, 
and  since  the  observations  are  likely  to  be  inexact,  there  may  be 
no  precise  solution  whatever.  In  these  circumstances  we  wish  to 
know  the  most  probable  set  of  values  of  the  x's  warranted  by 
these  observations. 

The  problem  is  precisely  similar  in  kind  to  that  dealt  with 
by  averages  and  differs  only  in  the  degree  of  its  complexity.  It 
is  the  problem  of  finding  the  most  probable  solution  of  such  a  set 
of  discrepant  equations  of  observation  that  the  Method  of  Least 
Squares  claims  to  solve. 

By  1750  the  astronomers  were  obtaining  such  equations  of 
observation  in  the  course  of  their  investigations,  and  the  question 
arose  as  to  the  proper  manner  of  their  solution.  Boscovich  in 
Italy,  Mayer  and  Lambert  in  Germany,  Laplace  in  France,  Euler 
in  Russia,  and  Simpson  in  England  proposed  different  methods 
of  solution.  Simpson,  in  1757,  was  the  first  to  introduce,  by  way 
of  simplification,  the  assumption  or  axiom  that  positive  and 
negative  errors  are  equally  probable.1  The  Method  of  Least 
Squares  was  first  definitely  stated  by  Legendre  in  1805,  who 
proposed  it  as  an  advantageous  method  of  adjusting  observations. 
This  was  soon  followed  by  the  '  proofs  '  of  Laplace  and  Gauss. 
But  it  is  easily  shown  that  these  proofs  involve  the  normal  law 
of  error  y  =  ke~ ''"'",  and  the  theory  of  Least  Squares  simply 
develops  the  mathematical  results  of  applying  to  equations  of 
observation,  which  involve  more  than  one  unknown,  that  law 

1  See  Merriman's  Method  of  Least  Squares,  p.  181,  for  an  historical  sketch, 
from  which  the  above  is  taken.  In  1877  Merriman  published  in  the  Trans 
actions  of  the  Connecticut  Academy  a  list  of  writings  relating  to  the  Method  of 
Least  Squares  and  the  theory  of  accidental  errors  of  observation,  which  com 
prised  408  titles— classified  as  313  memoirs,  72  books,  23  parts  of  books. 


CH.  xvn       FUNDAMENTAL  THEOREMS         211 

of  error  which  leads  to  the  Arithmetic  Mean  in  the  case  of  a  single 
unknown. 

11.  The  Weighting  of  Averages. — It  is  necessary  to  recur  to 
the  distinction  made  at  the  beginning  of  §  9  between  the  two 
types  to  which  our  average,  or,  as  it  is  generally  termed  in  social 
inquiries,  our  index  number,  may  belong.  The  average  or  index 
number  may  simply  summarise  a  set  of  facts  and  give  us  the 
actual  value  of  a  composite  quantity,  as,  for  example,  the  index 
number  of  the  cost  of  living.  In  such  cases  the  composite 
quantity,  in  which  we  are  interested,  need  not  contain  preciselv 
the  same  number  of  units  of  each  of  the  elementary  quantities  of 
which  it  is  composed,  so  that  the  '  weights,'  which  denote  the 
numbers  of  each  elementary  quantity  appropriate  to  the  com 
posite  quantity,  are  part  of  the  definition  of  the  composite 
quantity,  and  can  no  more  be  dispensed  with  than  the  magnitudes 
of  the  elementary  quantities  themselves.  Nor  in  such  cases  is 
the  rejection  of  discordant  observations  permissible  ;  if,  that  is 
to  say,  some  of  the  elementary  quantities  are  subject  to  much 
wider  variation,  or  to  variations  of  a  different  type  than  the 
majority,  that  is  no  reason  for  rejecting  them. 

On  the  other  hand,  the  individual  items,  out  of  which  the 
average  is  composed,  may  each  be  indications  or  approximate 
estimates  of  some  one  single  quantity  ;  and  the  average,  instead 
of  representing  the  measure  of  a  composite  quantity,  may  be 
selected  as  furnishing  the  most  probable  value  of  the  single 
quantity,  given,  as  evidence;  of  its  magnitude,  the  values  of  the 
various  terms  which  make  up  the  average. 

If  this  is  the  character  of  our  average,  t  he  problem  of  weighting 
depends  upon  what  we  know  about  the  individual  observations 
or  samples  or  indications,  out  of  which  our  average  is  to  be  built 
up.  The  units  in  question  mav  be  known  to  differ  in  respects 
relevant  to  the  probable  value  of  the  (jHaesitum.  Thus  then? 
may  be  reasons,  quite  apart  from  the  actual  results  of  the  indi 
vidual  observations  or  samples,  for  trusting  some  of  them  more 
than  others.  Our  knowledge  may  indicate  to  us,  in  fact,  that 
the  constants  of  the  laws  of  error  appropriate  to  the  several 
instances,  even  if  the  type  of  the  law  can  be  assumed  to  be 
constant,  should  be  varied  according  to  the  data  we  possess  about 
each.  It  may  also  indicate  to  us  that  the  condition  of  independ 
ence  between  the  instances,  which  the  method  of  averages 
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presumes,  is  imperfectly  satisfied,  and  consequently  that  our 
mode  of  combining  the  instances  in  an  average  must  be  modified 
accordingly. 

Some  modern  statisticians,  who,  really  influenced  perhaps  by 
practical  considerations,  have  been  inclined  to  deprecate  the 
importance  of  weighting  on  theoretical  grounds,  have  not  always 
been  quite  clear  what  kind  of  average  they  supposed  themselves 
to  be  dealing  with.  In  particular,  discussions  of  the  question  of 
weighting  in  connection  with  index  numbers  of  the  value  of 
money  have  suffered  from  this  confusion.  It  has  not  been  clear 
whether  such  index  numbers  really  represent  measures  of  a 
composite  quantity  or  whether  they  are  probable  estimates  of 
the  value  of  a  single  quantity  formed  by  combining  a  number  of 
independent  approximations  towards  the  value  of  this  quantity. 
The  original  Jevonian  conception  of  an  index  number  of  the 
value  of  money  was  decidedly  of  the  latter  type.  Modem  work 
on  the  subject  has  been  increasingly  dominated  by  the  other 
conception.  A  discussion  of  where  the  truth  lies  would  lead  me 
too  far  into  the  field  of  a  subject-matter  alien  to  that  of  this 
treatise. 

Theoretical  arguments  against  weighting  have  sometimes 
been  based  on  the  fact  that  to  weight  the  items  of  the  average 
in  an  irrelevant  manner,  or,  as  it  is  generally  expressed,  in  a 
random  manner,  is  not  likely,  provided  the  variations  between 
the  weights  are  small  compared  with  the  variations  between  the 
items,  to  affect  the  result  very  much.  But  why  should  any  one 
wish  to  weight  an  average  "  at  random  "  ?  Such  observations 
overlook  the  real  meaning  and  significance  of  weights.  They  are 
probably  inspired  by  the  fact  that  a  superficial  treatment  of 
statistics  would  sometimes  lead  to  the  introduction  of  weights 
which  are  irrelevant.  In  drawing  a  conclusion,  for  example, 
from  the  vital  statistics  of  various  towns,  the  figures  of  population 
for  the  different  towns  may  or  may  not  be  relevant  to  our  con 
clusion.  It  depends  on  the  character  of  the  argument.  If  they 
are  relevant,  it  may  be  right  to  employ  them  as  weights.  If  they 
are  irrelevant,  it  must  be  wrong  and  unnecessary  to  do  so.  The 
fact  that  wheat  is  a  more  important  article  of  consumption  than 
pins  may,  on  certain  assumptions,  be  irrelevant  to  the  usefulness 
of  variations  in  the  price  of  each  article  as  indications  of  variation 
in  the  value  of  money.  With  other  assumptions,  it  may  be 
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extremely  relevant.  Or  again,  we  may  know  that  observations 
with  a  particular  instrument  tend  to  be  too  large  and  must, 
therefore,  be  weighted  down.  It  is  contrary  both  to  theory  and 
to  common  sense  to  suppose  that  the  possession  of  information 
as  to  the  relative  reliability  of  different  statistics  is  not  useful. 
There  is  no  place,  therefore,  in  my  judgment,  for  a  generalised 
argument  as  to  the  propriety  or  impropriety  of  weighting  an 
average. 

It  should  be  added  that,  where  we  seek  to  build  up  an  index 
number  of  a  conception,  which  is  quantitative  but  is  not  itself 
numerically  measurable  in  any  denned  or  unambiguous  sense,  by 
combining  a  number  of  numerical  quantities,  which,  while  they 
do  not  measure  our  quaesitum  are  nevertheless  indications  of  its 
quantitative  variations  and  tend  to  fluctuate  in  the  same  sense, 
as,  for  example,  by  means  of  what  are  sometimes  called  economic 
barometers  of  the  state  of  business,  or  the  prosperity  of  the  country 
or  the  like,  some  very  confusing  questions  can  arise  both  as  to 
what  sort  of  a  thing  our  resulting  index  really  is,  and  as  to  the 
mode  of  compilation  appropriate  to  it. 

These  confusing  questions  always  arise  when,  instead  of 
measuring  a  quantity  directly,  we  seek  an  index  to  fluctuations 
in  its  magnitude  by  combining  in  an  average  the  fluctuations  of 
a  series  of  magnitudes,  which  are,  each  of  them  in  a  different  way, 
to  some  extent  (but  only  to  some  extent),  correlated  with  fluctua 
tions  in  our  qnaesitum.  I  must  not  burden  this  book  with  a 
discussion  of  the  problems  of  Index  Numbers.  But  I  venture  (o 
think  that  they  would  be  sooner  cleared  up  if  the  natures  and 
purposes  of  differing  index  numbers  were  more  sharply  distin 
guished  those,  namely,  which  are  simply  descriptive  of  acomposite 
commodity,  those,  which  seek  to  combine  results  differing  from 
one  another  in  a  wav  analogous  to  the  variations  of  an  instrument 
of  precision,  and  those,  which  combine  results,  not  of  the  qnaexitum 
itself,  but  of  various  other  quantities,  variations  in  which  are 
partly  due  to  variations  in  the  tjnwsitinn..  but  which  we  well 
know  to  be  also  due  to  other  distinguishable  influences.  Index 
numbers  of  the  third  tvpe  are  often  treated  bv  methods  and 
arguments  only  appropriate  to  those  of  the  second  type. 

12.  The.  Reject  ion  of  Discordant  Observations.  -This  differs 
from  the  problem  just  discussed,  because  we  have?  supposed  so 
far  that  our  system  of  weighting  is  determined  by  data  which  we 
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possess  prior  to  and  apart  from  our  knowledge  of  the  actual 
magnitude  of  the  items  of  our  average.  The  principle  of  the 
rejection  of  discordant  observations  comes  in  when  it  is  argued 
that,  if  one  or  more  of  our  observations  show  great  discrepancies 
from  the  results  of  the  greater  number,  these  ought  to  be  partly 
or  entirely  neglected  in  striking  the  average,  even  if  there  is  no 
reason,  except  their  discrepancy  from  the  rest,  for  attributing 
less  weight  to  them  than  to  the  others.  By  some  this  practice 
has  been  thought  to  be  in  accordance  with  the  dictates  of  common 
sense  ;  by  others  it  is  denounced  as  savouring  even  of  forgery.1 

This  controversy,  like  so  many  others  in  Probability,  is  due 
to  a  failure  to  understand  the  meaning  of  '  independence.'  The 
mathematics  of  the  orthodox  theory  of  Averages  and  Least 
Squares  depend,  as  we  have  seen,  upon  the  assumption  that  the 
observations  are  '  independent '  ;  but  this  has  sometimes  been 
interpreted  to  mean  a  physical  independence.  In  point  of  fact, 
the  theory  requires  that  the  observations  shall  be  independent, 
in  the  sense  that  a  knowledge  of  the  result  of  some  does  not  affect 
the  probability  that  the  others,  when  known,  involve  given 
errors. 

Clearly  there  may  be  initial  data  in  relation  to  which  this 
supposition  is  entirely  or  approximately  accurate.  But  in  many 
cases  the  assumption  wrill  be  inadmissible.  A  knowledge  of  the 
results  of  a  number  of  observations  may  lead  us  to  modify  our 
opinion  as  to  the  relative  reliabilities  of  others. 

The  question,  whether  or  not  discordant  observations  should 
be  specially  weighted  down,  turns,  therefore,  upon  the  nature  of 
the  preliminary  data  by  which  we  have  been  guided  in  initially 
adopting  a  particular  law  of  error  as  appropriate  to  the  observa 
tions.  If  the  observations  are,  relevant  to  these  data,  strictly 
'  independent,'  in  the  sense  required  for  probability,  then  rejection 
is  not  permissible.  But  if  this  condition  is  not  fulfilled,  a  bias 
against  discordant  observations  may  be  well  justified. 

1  E.g.  G.  Hagen's  Grundzuge  der  Wahrscheinlichkeilsrechnuwg,  p.  63  :  "  Die 
Tauschung,  die  man  durch  Verschweigen  von  Messungen  begeht,  lasst  sich 
eben  so  wenig  entschuldigen,  als  wenn  man  Messungen  falschen  oder  fingiren 
wollte." 
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INDUCTION   AND   ANALOGY 


CHAPTER   XVIII 

INTRODUCTION 

Nothing  so  like  as  eggs  ;  yet  no  one,  on  account  of  this  apparent  similarity, 
expects  the  same  taste  and  relish  in  all  of  them.  'Tis  only  after  a  long  course 
of  uniform  experiments  in  any  kind,  that  we  attain  a  firm  reliance  ami  security 
with  regard  to  a  particular  event.  Now  where  is  that  process  of  reasoning, 
which  from  one  instance  draws  a  conclusion,  so  different  from  that  which  it 
infers  from  a  hundred  instances,  that  are  no  way  different  from  that  single 
instance  ?  This  question  I  propose  as  much  for  tho  sake  of  information,  as 
with  any  intention  of  raising  difficulties.  I  cannot  find,  I  cannot  imagine  any 
such  reasoning.  But  I  keep  my  mind  still  open  to  instruction,  if  any  one  will 
vouchsafe  to  bestow  it  on  me. —  Hi'MK.1 

1.  J  HAVE  described  Probability  as  comprising  that  part  of 
logic  which  deals  with  arguments  which  are  rational  but  not 
conclusive.  By  far  the  most  important  types  of  such  arguments 
are  those  which  are  based  on  the  methods  of  Induction  and 
Analogy.  Almost  all  empirical  science  rests  on  these.  And  the 
decisions  dictated  by  experience  in  the  ordinary  conduct  of  life 
generally  depend  on  them.  To  the  analysis  and  logical  justifica 
tion  of  these  methods  the  following  chapters  are  directed. 

Inductive  processes  have  formed,  of  course,  at  all  times  a 
vital,  habitual  part  of  the  mind's  machinery.  Whenever  we  learn 
by  experience,  we  are  using  them.  But  in  the  logic  of  the  schools 
they  have  taken  tin  ir  proper  place  slowly.  No  clear  or  satis 
factory  account  of  them  is  to  be  found  anywhere.  Within  and 
yet  beyond  the  scope  of  formal  logic,  on  the  line,  apparently, 
between  mental  and  natural  philosophy,  Induction  has  been 
admitted  into  the  organon  of  scientific  proof,  without  much  help 
from  the  logicians,  no  one  quite  knows  when. 

2.  What  are  its  distinguishing  characteristics  ?  What  are 
the  qualities  which  in  ordinary  discourse  seem  to  afTord  strength 


to  an  inductive  argument  ? 


roncernintj  Hitman  Understanding. 
L'17 
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I  shall  try  to  answer  these  questions  before  I  proceed  to 
the  more  fundamental  problem — What  ground  have  we  for  re 
garding  such  arguments  as  rational  ? 

Let  the  reader  remember,  therefore,  that  in  the  first  of  the 
succeeding  chapters  my  main  purpose  is  no  more  than  to  state 
in  precise  language  what  elements  are  commonly  regarded  as 
adding  weight  to  an  empirical  or  inductive  argument.  This 
requires  some  patience  and  a  good  deal  of  definition  and  special 
terminology.  But  I  do  not  think  that  the  work  is  controversial. 
At  any  rate,  I  am  satisfied  myself  that  the  analysis  of  Chapter 
XIX.  is  fairly  adequate. 

In  the  next  section,  Chapters  XX.  and  XXL,  I  continue  in 
part  the  same  task,  but  also  try  to  elucidate  what  sort  of  assump 
tions,  if  we  could  adopt  them,  lie  behind  and  are  required  by  the 
methods  just  analysed.  In  Chapter  XXII.  the  nature  of  these 
assumptions  is  discussed  further,  and  their  possible  justification 
is  debated. 

3.  The  passage  quoted  from  Hume  at  the  head  of  this  chapter 
is  a  good  introduction  to  our  subject.  Nothing  so  like  as  eggs, 
and  after  a  long  course  of  uniform  experiments  we  can  expect 
with  a  firm  reliance  and  security  the  same  taste  and  relish  in  all 
of  them.  The  eggs  must  be  like  eggs,  and  we  must  have  tasted 
many  of  them.  This  argument  is  based  partly  upon  Analogy 
'  and  partly  upon  what  may  be  termed  Pure  Induction.  We  argue 
from  Analogy  in  so  far  as  we  depend  upon  the  likeness  of  the  eggs, 
and  from  Pure  Induction  when  we  trust  the  number  of  the  ex 
periments. 

It  will  be  useful  to  call  arguments  inductive  which  depend 
in  any  way  on  the  methods  of  Analogy  and  Pure  Induction.  But 
I  do  not  mean  to  suggest  by  the  use  of  the  term  inductive  that  these 
methods  are  necessarily  confined  to  the  objects  of  phenomenal 
experience  and  to  what  are  sometimes  called  empirical  questions  ; 
or  to  preclude  from  the  outset  the  possibility  of  their  use  in 
abstract  and  metaphysical  inquiries.  While  the  term  inductive 
will  be  employed  in  this  general  sense,  the  expression  Pure 
Induction  must  be  kept  for  that  part  of  the  argument  which 
arises  out  of  the  repetition  of  instances. 

4.  Hume's  account,  however,  is  incomplete.  His  argument 
could  have  been  improved.  His  experiments  should  not  have 
been  too  uniform,  and  ought  to  have  differed  from  one  another 
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as  much  as  possible  in  all  respects  save  that  of  the  likeness  of  the 
eggs.  He  should  have  tried  eggs  in  the  town  and  in  the  country. 
in  January  and  in  .June.  He  might  then  have  discovered  that 
eggs  could  be  good  or  bad,  however  like  they  looked. 

This  principle  of  varying  those  of  the  characteristics  of  the 
instances,  which  we  regard  in  the  conditions  of  our  generalisation 
as  non-essential,  may  be  termed  Negative  Analogy. 

It  will  be  argued  later  on  that  an  increase  in  the  number  oi 
experiments  is  only  valuable  in  so  far  as,  by  increasing,  or  possibly 
increasing,  the  variety  found  amongst  the  non-essential  char 
acteristics  of  the  instances,  it  strengthens  the  Negative  Analogy. 
If  Hume's  experiments  had  been  absolutely  uniform,  he  would 
have  been  right  to  raise  doubts  about  Ihe  conclusion.  There  is 
no  process  of  reasoning,  which  from  one  instance  draws  a  con 
clusion  different  from  that  which  it  infers  from  a  hundred  in 
stances,  if  the  latter  are  known  to  be  in  no  way  different  from 
the  former.  Hume  has  unconsciously  misrepresented  the  typical 
inductive  argument. 

When  our  control  of  the  experiments  is  fairly  complete,  and 
the  conditions  in  which  they  take  place  are  well  known,  there  is 
not  much  room  for  assistance  from  Pure  Induction.  If  tin- 
Negative  Analogies  are  known,  there  is  no  need  to  count  the 
instances.  But  where  our  control  is  incomplete,  and  we  do  not 
know  accurately  in  what  ways  the  instances  differ  from  one 
another,  then  an  increase  in  the  mere  number  of  the  instances 
helps  the  argument.  For  unless  we  know  for  certain  that  the 
instances  are  perfectly  uniform,  each  new  instance  may  possibly 
add  to  the.  Negative  Analogy. 

Hume  might  also  have  weakened  his  argument,  lie  expects 
no  more  than  the  same  taste  and  relish  from  his  eggs.  He 
attempts  no  conclusion  as  to  whether  his  stomach  will  always 
draw  from  them  the  same  nourishment.  He  has  conserved  the 
force  of  his  generalisation  by  keeping  it  narrow. 

5.  In  an  inductive  argument,  therefore,  we  start  with  a 
number  of  instances  similar  in  some  respects  AH,  dissimilar  in 
others  ('.  We  pick  out  one  or  more  respects  A  in  which  the 
instances  are  similar,  and  argue  that  some  of  the  other  respects 
B  in  which  they  are  also  similar  are  likely  to  be  associated  with 
the  characteristics  A  in  other  unexamined  cases.  The  more 
comprehensive  the  essential  characteristics  A,  the  greater  the 
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variety  amongst  the  non-essential  characteristics  C,  and  the  less 
comprehensive  the  characteristics  B  which  we  seek  to  associate 
with  A,  the  stronger  is  the  likelihood  or  probability  of  the  general 
isation  we  seek  to  establish. 

These  are  the  three  ultimate  logical  elements  on  which  the 
probability  of  an  empirical  argument  depends, — the  Positive 
and  the  Negative  Analogies  and  the  scope  of  the  generalisation. 

6.  Amongst    the    generalisations    arising    out    of    empirical 
argument  we  can  distinguish  two  separate  types.     The  first  of 
these  may  be  termed  universal  induction.     Although  such  in 
ductions  are  themselves  susceptible  of  any  degree  of  probability, 
they  affirm  invariable  relations.     The  generalisations  which  they 
assert,  that  is  to  say,  claim  universality,  and  are  upset  if   a 
single  exception  to  them  can  be  discovered.     Only  in  the  more 
exact  sciences,  however,  do   we  aim  at  establishing  universal 
inductions/    In  the  majority  of  cases  we  are  content  with  that 
other  kind  of  induction  which  leads  up  to  laws  upon   which 
we  can  generally  depend,  but  which  does  not  claim,  however 
adequately  established,  to  assert  a  law  of  more  than  probable 
connection.1    This  second  type  may  be  termed  Inductive  Correla 
tion.     If,  for  instance,  we  base  upon  the  data,  that  this  and  that 
and  those  swans  are  white,  the  conclusion  that  all  swans  are  white, 
we  are  endeavouring  to  establish  a  universal  induction.     But  if 
we  base  upon  the  data  that  this  and  those  swans  are  white  and 
that  swan  is  black,  the  conclusion  that  most  swans  are  white, 
or  that  the  probability  of  a  swan's  being  white  is  such  and  such, 
then  we  are  establishing  an  inductive  correlation. 

Of  these  two  types,  the  former — universal  induction — pre 
sents  both  the  simpler  and  the  more  fundamental  problem.  In 
this  part  of  my  treatise  I  shall  confine  myself  to  it  almost  entirely. 
In  Part  V.,  on  the  Foundations  of  Statistical  Inference,  I  shall 
discuss,  so  far  as  I  can,  the  logical  basis  of  inductive  correlation. 

7.  The  fundamental  connection  between  Inductive  Method 
and  Probability  deserves  all  the  emphasis  I  can  give  it.      Many 
writers,  it  is  true,  have  recognised  that  the  conclusions  which  we 
reach  by  inductive   argument   are   probable   and   inconclusive. 
Jevons,  for  instance,  endeavoured  to  justify  inductive  processes 
by  means  of  the  principles  of  inverse  probability.     And  it  is  true 
also  that  much  of  the  work  of  Laplace  and  his  followers  was 

1  What  Mill  calls  '  approximate  generalisations.' 
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directed  to  the  solution  of  essentially  inductive  problems.  But 
it  has  been  seldom  apprehended  clearly,  either  by  these  writers 
or  by  others,  that  the  validity  of  every  induction,  strictly  inter 
preted,  depends,  not  on  a  matter  of  fact,  but  on  the  existence  of 
a  relation  of  probability.  An  inductive  argument  affirms,  not 
that  a  certain  matter  of  fact  is  so,  but  that  relative  to  certain 
evidence  there  is  a  probability  in  its  favour.  The  validity  of  the 
induction,  relative  to  the  original  evidence,  is  not  upset,  therefore, 
if,  as  a  fact,  the  truth  turns  out  to  be  otherwise. 

The  clear  apprehension  of  this  truth  profoundly  modifies 
our  attitude  towards  the  solution  of  the  inductive  problem.  The 
validity  of  the  inductive  method  does  not  depend  on  the  success 
of  its  predictions.  Its  repeated  failure  in  the  past  may,  of  course, 
supply  us  with  new  evidence,  the  inclusion  of  which  will  modify 
the  force  of  subsequent  inductions.  Hut  the  force  of  the  old 
induction  relative  to  the  old  evidence  is  untouched.  The  evidence 
with  which  our  experience  has  supplied  us  in  the  past  may  have 
proved  misleading,  but  this  is  entirely  irrelevant  to  the 
question  of  what  conclusion  we  ought  reasonably  to  have 
drawn  from  the  evidence  then  before  us.  The  validity  and 
reasonable  nature  of  inductive  generalisation  is,  therefore,  a 
question  of  logic  and  not  of  experience,  of  formal  and  not  of 
material  laws.  The  actual  constitution  of  the  phenomenal 
universe  determines  the  character  of  our  evidence  ;  but  it  cannot 
determine  what  conclusions  uiven  evidence  ralinnulli)  supports. 


CHAPTER  XIX 

THE  NATURE  OF  ARGUMENT  BY  ANALOGY 

All  kinds  of  reasoning  from  causes  or  effects  are  founded  on  two  particulars, 
viz.  the  constant  conjunction  of  any  two  objects  in  all  past  experience,  and  the 
resemblance  of  a  present  object  to  any  of  them.  Without  some  degree  of 
resemblance,  as  well  as  union,  'tis  impossible  there  can  be  any  reasoning. — 

HUME.1 

1.  HUME  rightly  maintains  that  some  degree  of  resemblance 
must  always  exist  between  the  various  instances  upon  which  a 
generalisation  is  based.  For  they  must  have  this,  at  least,  in 
common,  that  they  are  instances  of  the  proposition  which 
generalises  them.  Some  element  of  analogy  must,  therefore, 
lie  at  the  base  of  every  inductive  argument.  In  this  chapter  I 
shall  try  to  explain  with  precision  the  meaning  of  Analogy,  and 
to  analyse  the  reasons,  for  which,  rightly  or  wrongly,  we  usually 
regard  analogies  as  strong  or  weak,  without  considering  at  present 
whether  it  is  possible  to  find  a  good  reason  for  our  instinctive 
principle  that  likeness  breeds  the  expectation  of  likeness. 

2.  There  are  a  few  technical  terms  to  be  denned.  We  mean 
by  a  generalisation  a  statement  that  all  of  a  certain  definable  class 
of  propositions  are  true.  It  is  convenient  to  specify  this  class 
in  the  following  way.  If  f(x)  is  true  for  all  those  values  of  x  for 
which  (f)(x)  is  true,  then  we  have  a  generalisation  about  $  and  / 
which  we  may  write  #(</>,/)•  If,  for  example,  we  are  dealing  with 
the  generalisation,  "  All  swans  are  white,"  this  is  equivalent  to 
the  statement,  "  '  x  is  white  '  is  true  for  all  those  values  of  x  for 
which  '  x  is  a  swan  '  is  true."  The  proposition  (f)(a).f(a)  is  an 
instance  of  the  generalisation  #((/>,  /). 

By  thus  defining  a  generalisation  in  terms  of  prepositional 
functions,  it  becomes  possible  to  deal  with  all  kinds  of  gencralisa- 

1  A  Treatise  of  Human  Nature. 
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tions  in  a  uniform  way  ;  and  also  to  bring  generalisation  into 
convenient  connection  with  our  definition  of  Analogy. 
,,  If  some  one  thing  is  true  about  both  of  two  objects,  if,  that  is 
to  say,  they  both  satisfy  the  same  prepositional  function,  then  to 
'this  extent  there  is  an  analogy  between  them.  Every  generalisa 
tion  </((£,  /),  therefore,  asserts  that  one  analogy  is  always  accom 
panied  by  another,  namely,  that  between  all  objects  having  the 
analogy  </>  there  is  also  the  analogy  /.  The  set  of  propositional 
functions,  which  are  satisfied  by  both  of  the  two  objects,  con 
stitute  the  positive  analogy.  The  analogies,  which  would  be 
disclosed  by  complete  knowledge,  may  be  termed  the  total  positive 
analogy  ;  those  which  are  relative  to  partial  knowledge,  the 
known  -positive  analogy. 

As  the  positive  analogy  measures  the  resemblances,  so  the 
negative  analogy  measures  the  differences  between  the  two  objects. 
The  set  of  functions,  such  that  each  is  satisfied  by  one  and  not 
by  the  other  of  the  objects,  constitutes  the  negative  analogy. 
We  have,  as  before,  the  distinction  between  the  total  negative 
analogy  and  the  known  negative  analogy. 

This  set  of  definitions  is  soon  extended  to  the  cases  in  which 
the  number  of  instances  exceeds  two.  The  functions  which  are 
true  of  all  of  the  instances  constitute  the  positive  analogy  of  the 
set  of  instances,  and  those  which  are  true  of  some  only,  and  are 
false  of  others,  constitute  the  negative  analogy.  It  is  clear  that 
a  function,  which  represents  positive  analogy  for  a  group  of 
instances  taken  out  of  the  set,  may  be  a  negative  analogy  for  the 
set  as  a  whole.  Analogies  of  this  kind,  which  are  positive  for 
a  sub-class  of  the  instances,  but  negative  for  the  whole  class,  we 
may  term  sub-analogies.  JBy  thLs  it  is  meant  that  there  are 
resemblances  which  an-  common  to  some  of  the  instances,  but 
not  to  all. 

A  simple  notation,  in  accordance  with  these  definitions,  will 
be  useful.  If  there  is  a  positive  analogy  $  between  a  set  of  in 
stances  al  .  .  .  «„,  whether  or  not  this  is  the  total  analogy 
between  them,  let  us  write  this — 

A  (,/,).' 
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And  if  there  is  a  negative  analogy  <pf,  let  us  write  this — 
A      '.i 


Thus    A    ((£)    expresses   the    fact    that    there    is   a    set    of 

•ii .  v  «„ 

characteristics  <£  which   are  common  to  all   the  instances,  and 
A    ((//)    that   there   is   a    set   of    characteristics   <£'    which    is 

true  of  at  least  one  of  the  instances  and  false  of  at  least  one. 

3.  In  the  typical  argument  from  analogy  we  wish  to  generalise 
from  one  part  to  another  of  the  total  analogy  which  experience 
has  shown  to  exist  between  certain  selected  instances.     In  all  the 
cases  where  one  characteristic  c/>  has  been  found  to  exist,  another 
characteristic/ has  been  found  to  be  associated  with  it.    We  argue 
from  this  that  any  instance,  which  is  known  to  share  the  first 
analogy  </>,  is  likely  to  share  also  the  second  analogy/.     We  have 
found  in  certain  cases,  that  is  to  say,  that  both  <f>  and  /  are  true 
of  them  ;   and  we  wish  to  assert  /  as  true  of  other  cases  in  which 
we  have  only  observed  q>.    We  seek  to  establish  the  generalisation 
#(</>,/),  on  the  ground  that  <f>  and  /  constitute  between  them  an 
observed  positive  analogy  in  a  given  set  of  experiences. 

But  while  the  argument  is  of  this  character,  the  grounds,  upon 
which  we  attribute  more  or  less  weight  to  it,  are  often  rather 
complex  ;  and  we  must  discuss  them,  therefore,  in  a  systematic 
manner. 

4.  According  to  the  view  suggested  in  the  last  chapter,  the 
value  of  such  an  argument  depends  partly  upon  the  nature  of  the 
conclusion  which  we  seek  to  draw,  partly  upon  the  evidence 
which  supports  it.     If  Hume  had  expected  the  same  degree  of 
nourishment  as  well  as  the  same  taste  and  relish  from  all  of  the 
eggs,  he  would  have  drawn  a  conclusion  of  weaker  probability. 
Let  us  consider,  then,  this  dependence  of  the  probability  upon  the 
scope  of  the  generalisation  g( </>,/), — upon  the  comprehensiveness, 
that  is  to  say,  of  the  condition  <£  and  the  conclusion/ respectively. 

The  more  comprehensive  the  condition  <p  and  the  less  com 
prehensive  the  conclusion  /,  the  greater  a  priori  probability  do 
we  attribute  to  the  generalisation  g.  With  every  increase  in  (/> 
this  probability  increases,  and  with  every  increase  in  /  it  will 
diminish. 

1  Hence     A  (0')   =    2    0'(,'.') .     2      0V'J- 
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The  condition  </>(  =  <£i</>2)  is  more  comprehensive  than  the 
condition  fa,  relative  to  the  general  evidence  h,  if  (£2  is  a  condition 
independent  of  </>!  relative  to  h,  </>2  being  independent  of  fa,  if 
9(4*1*  4>2)Pl  *  1»  l'-g-  if'  relative  to  h,  the  satisfaction  of  <£2  is  not 
inferrible  from  that  of  c/^. 

Similarly  the  conclusion  /(  =/!/2)  is  more  comprehensive  than 
the  conclusion/j,  relative  to  the  general  evidence  h,  if  /2  is  a  con 
clusion  independent  of  /r  relative  to  h,  i.e.  if  #(/i,/2)//*  *!• 

If  0  =(f>i<t>2  and/s/j/2i  where  </>x  and  </>2  are  independent  and 
/x  and/,  are  independent  relative  to  h,  we  have  —  • 


so  that  ;/(</>,  ./i)///  -.'/(</>,  /')//'  2>//(<h,  ./•)///  . 

This  proves  the  statement  made  above.  It  will  be  noticed 
that  we  cannot  necessarily  compare  the  d  priori  probabilities 
of  two  generalisations  in  respect  of  more  and  less,  unless  the  con 
dition  of  the  first  is  included  in  the  condition  of  the  second,  and 
the  conclusion  of  the  second  is  included  in  that  of  the  first. 

We  see,  therefore,  that  some  generalisations  stand  initially 
in  a  stronger  position  than  others.  In  order  to  attain  a  given 
degree  of  probability,  generalisations  require,  according  to  their 
scope,  different  amounts  of  favourable  evidence  to  support  them. 

5.  Let  us  now  pass  from  the  character  of  the  generalisation 
d  priori  to  the  evidence  by  which  we  support  it.  Since,  when 
ever  the  conclusion  f  is  complex,  i.e.  resolvable  into  the  form 
/i/2  where  g(flt  />)//*  *  1,  we  can  express  the  probability  of  the 
generalisation  </((/>,/)  as  the  product  of  the  probabilities  of  the 
two  generalisations  y(<$>f\*f>S)  and  #(</>,  /i),  we  may  assume  in  what 
follows,  that  the  conclusion/  is  simple  and  not  capable  of  further 
analysis,  without  diminishing  the  generality  of  our  argument. 

We  will  begin  with  the  simplest  case,  namely,  that  which 
arises  in  the  following  conditions.  First,  let  us  assume  that  our 
knowledge  of  the  examined  instances  is  complete,  so  that  we  know 
of  every  statement,  which  is  about  the  examined  instances, 
whether  it  is  true  or  false  of  each.1  Second,  let  us  assume  that 

1  If  -Y('I)  is  a  proposition  un<I  \H>i)  ft  .  t>(n),  when:  h  is  a  proposition  not 
involving  a,  then  we  inu.st  rcpurd  tl(n).  not  r(a)  nn  the  statement  ubout  u. 
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all  the  instances  which  are  known  to  satisfy  the  condition  c/>, 
are  also  known  to  satisfy  the  conclusion  /  of  the  generalisation. 
And  third  let  us  assume  that  there  is  nothing  which  is  true  of 
all  the  examined  instances  and  yet  not  included  either  in  </>  or 
in  /,  i.e.  that  the  positive  analogy  between  the  instances  is 
exactly  co-extensive  with  the  analogy  <£/  which  is  covered  by  the 
generalisation. 

Such  evidence  as  this  constitutes  what  we  may  term  a  perfect 
analogy.  The  argument  in  favour  of  the  generalisation  cannot 
*  be  further  improved  by  a  knowledge  of  additional  instances. 
Since  the  positive  analogy  between  the  instances  is  exactly 
coextensive  with  the  analogy  covered  by  the  generalisation,  and 
since  our  knowledge  of  the  examined  instances  is  complete,  there 
is  no  need  to  take  account  of  the  negative  analogy. 

An  analogy  of  this  kind,  however,  is  not  likely  to  have  much 
practical  utility  ;  for  if  the  analogy  covered  by  the  generalisa 
tion,  covers  the  whole  of  the  positive  analogy  between  the  instances 
it  is  difficult  to  see  to  what  other  instances  the  generalisation  can 
be  applicable.  Any  instance,  about  which  everything  is  true 
which  is  true  of  all  of  a  set  of  instances,  must  be  identical  with 
one  of  them.  Indeed,  an  argument  from  perfect  analogy  can 
only  have  practical  utility,  if,  as  will  be  argued  later  on,  there  are 
some  distinctions  between  instances  which  are  irrelevant  for  the 
purposes  of  analogy,  and  if,  in  a  perfect  analogy,  the  positive 
analogy,  of  which  we  must  take  account,  need  cover  only  those 
distinctions  which  are  relevant.  In  this  case  a  generalisation 
based  on  perfect  analogy  might  cover  instances  numerically 
distinct  from  those  of  the  original  set. 

The  law  of  the  Uniformity  of  Nature  appears  to  me  to  amount 
to  an  assertion  that  an  analogy  which  is  perfect,  except  that  mere 
differences  of  position  in  time  and  space  are  treated  as  irrelevant, 
is  a  valid  basis  for  a  generalisation,  two  total  causes  being  re 
garded  as  the  vame  if  they  only  differ  in  their  positions  in  time 
or  space.  This,  I  think,  is  the  whole  of  the  importance  which 
this  law  has  for  the  theory  of  inductive  argument.  It  involves 
the  assertion  of  a  generalised  judgment  of  irrelevance,  namely, 
of  the  irrelevance  of  mere  position  in  time  and  space  to  generalisa 
tions  which  have  no  reference  to  particular  positions  in  time 
and  space.  It  is  in  respect  of  such  position  in  time  or  space  that 
'  nature  '  is  supposed  '  uniform.'  The  significance  of  the  law 
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and  the  nature  of  its  justification,  if  any,  are  further  discussed 
in  Chapter  XXII. 

6.  Let  us  now  pass  to  the  type  which  is  next  in  order  of 
simplicity.  We  will  relax  the  first  condition  and  no  longer  assume 
that  the  whole  of  the  positive  analogy  between  the  instances  is 
covered  by  the  generalisation,  though  retaining  the  assumption 
that  our  knowledge  of  the  examined  instances  is  complete.  We 
know,  that  is  to  say,  that  there  are  some  respects  in  which  the 
examined  instances  are  all  alike,  and  yet  which  are  not  covered 
by  the  generalisation.  If  ^  is  the  part  of  the  positive  analogy 
between  the  instances  which  is  not  covered  by  the  generalisation, 
then  the  probability  of  this  type  of  argument  from  analogy  can 
be  written  — 


The  value  of  this  probability  turns  on  the  comprehensiveness 
of  </>r  There  are  some  characteristics  <£j  common  to  all  the 
instances,  which  the  generalisation  treats  as  unessential,  but 
the  less  comprehensive  these  are  the  better.  (pl  stands  for  the 
characteristics  in  which  all  the  instances  resemble  one  another 
outside  those  covered  by  the  generalisation.  To  reduce  those 
resemblances  between  the  instances  is  the  same  thing  as  to 
increase  the  differences  between  them.  And  hence  any  increase 
in  the  Negative  Analogy  involves  a  reduction  in  the  compre 
hensiveness  of  <f)v  When,  however,  our  knowledge,  of  the 
instances  is  complete,  it  is  not  necessary  to  make  separate 
mention  of  the  negative  analogy  A  (<£')  in  the  above  formula. 

'*]  "n 

For  (/>'  simply  includes  all  those  functions  about  the  instances, 
which  are  not  included  in  (/></>!/,  and  of  which  the  contradictories 
are  not  included  in  them  ;  so  that  in  stating  A  (0<£j  /'),  we 

state  by  implication     A     ((/>')  also. 

The  whole  process  of  strengthening  the  argument  in  favour 
of  the  generalisation  g(  (/>,/)  by  the  accumulation  of  further  ex 
perience  appears  to  me  to  consist  in  making  the  argument 
approximate  as  nearly  as  possible  to  the  conditions  of  a  perfect 
analogy,  by  steadily  reducing  the  comprehensiveness  of  thorn; 
resemblances  <£,  between  the  instances  which  our  generalisation 
disregards.  Thus  the  advantage  of  additional  instances,  derived 
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from  experience,  arises  not  out  of  their  number  as  such,  but  out 
of  their  tendency  to  limit  and  reduce  the  comprehensiveness  of 
fa,  or,  in  other  words,  out  of  their  tendency  to  increase  the  negative 
analogy  <//,  since  fa<f>f  comprise  between  them  whatever  is  not 
covered  by  (/>/.  The  more  numerous  the  instances,  the  less  com 
prehensive  are  their  superfluous  resemblances  likely  to  be.  But 
a  single  additional  instance  which  greatly  reduced  fa  would  in 
crease  the  probability  of  the  argument  more  than  a  large  number 
of  instances  which  affected  fa  less. 

7.  The  nature  of  the  argument  examined  so  far  is,  then,  that 
the  instances  all  have  some  characteristics  in  common  which 
we  have  ignored  in  framing  our  generalisation  ;  but  it  is  still 
assumed  that  our  knowledge  about  the  examined  instances  is 
complete.  We  will  next  dispense  with  this  latter  assumption,  and 
deal  with  the  case  in  which  our  knowledge  of  the  characteristics 
of  the  examined  instances  themselves  is  or  may  be  incomplete. 

It  is  now  necessary  to  take  explicit  account  of  the  known 
negative  analogy.  For  when  the  known  positive  analogy  falls 
short  of  the  total  positive  analogy,  it  is  not  possible  to  infer  the 
negative  analogy  from  it.  Differences  may  be  known  between  the 
instances  which  cannot  be  inferred  from  the  known  positive 
analogy.  The  probability  of  the  argument  must,  therefore,  be 
written — 

#(</>>/)/    A     (</><£i/)     A 


where  <j>faf  stands  for  the  characteristics  in  which  all  n  instances 
aj  alt  are  known  to  be  alike,  and  (//  stands  for  the  char 

acteristics  in  which  they  are  known  to  differ. 

This  argument  is  strengthened  by  any  additional  instance  or 
by  any  additional  knowledge  about  the  former  instances  which 
diminishes  the  known  superfluous  resemblances  fa  or  increases  the 
negative  analogy  <//.  The  object  of  the  accumulation  of  further 
experience  is  still  the  same  as  before,  namely,  to  make  the  form 
of  the  argument  approximate  more  and  more  closely  to  that  of 
perfect  analogy.  Now,  however,  that  our  knowledge  of  the 
instances  is  no  longer  assumed  to  be  complete,  we  must  take 
account  of  the  mere  number  n  of  the  instances,  as  well  as  of  our 
specific  knowledge  in  regard  to  them  ;  for  the  more  numerous 
the  instances  are,  the  greater  the  opportunity  for  the  total 
negative  analogy  to  exceed  the  known  negative  analogy.  But 
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the  more  complete  our  knowledge  of  the  instances,  the  less 
attention  need  we  pay  to  their  mere  number,  and  the  more 
imperft-ct  our  knowledge  the  greater  the  stress  which  must  be 
laid  upon  the  argument  from  number.  This  part  of  the  argu 
ment  will  bo  discussed  in  detail  in  the  following  chapter  on 
Pure  Induction. 

8.  When  our  knowledge  of  the  instances  is  incomplete,  there 
may  exist  analogies  which  are  known  to  be  true  of  some  of  the 
instances  and  are  not  known  to  be  false  of  any.  These  sub- 
analogies  (see  §  2)  are  not  so  dangerous  as  the  positive  analogies  fa, 
which  are  known  to  be  true  of  all  the  instances,  but  their  existence 
is,  evidently,  an  element  of  weakness,  which  we  must  endeavour 
to  eliminate  by  the  growth  of  knowledge  and  the  multiplication 
of  instances.  A  sub-analogy  of  this  kind  between  the  instances 
ar  .  .  .  a,  may  be  written  A  (^  )  ;  and  the  formula,  if  it 

is  to  take  account  of  all  the  relevant  information,  ought,  there 
fore,  to  be  written  — 

</(</>,/)/    A    (jfaf)    A     (c//)  I  I.I     A     (^)l, 

/    "l      ••""  "I  •  •  •"..  I    fir.  .  .•(,  j 

where  the  terms  of   II  I     A     (^.)l   stand  for  the  various  sub- 

analogies  between  sub-classes  of  the  instances,  which  are  not 
included  in  (/>(/>!/  or  in  $'  . 

9.  There  is  now  another  complexity  to  be  introduced.  We 
must  dispense  with  the  assumption  that  the  whole  of  the  analogy 
covered  by  the  generalisation  is  known  to  exist  in  all  the  instances. 
For  there  may  be  some  instances  within  our  experience,  about 
which  our  knowledge  is  incomplete,  but  which  show  part  of  the 
analogy  required  by  the  generalisation  and  nothing  which  con 
tradicts  it;  and  such  instances  afford  some  support  to  the 
generalisation.  Suppose  that  ,.</>  and  ,/  are  part  of  </>  and  /  re 
spectively,  then  we  may  have  a  set  of  instances  AX  .  .  .  btil  which 
show  the  following  analogies  : 

A    (>,<>,,,         A       (>'!  ||    A         . 


where  ^  is  the  analogy  not  covered  by  the  generalisation,  and 
so  on,  as  before. 
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The  formula,  therefore,  is  now  as  follows  : 

,j(^j-)l  IT  /    A    („<£„</>!,,/)    A    U'mif    A   (^.)\. 

/o,b...\ai...aM  ai...  an  J        ^.  .  .  . 

In  this  expression  a</>,  a/are  the  whole  or  part  of  </>,/;  the  product 
II    is  composed  of  the  positive  and  negative  analogies  for  each 

of  the  sets  of  instances  ox  .  .  .  an,  b^...bm,  etc.;  and  the 
product  II  contains  the  various  sub-analogies  of  different  sub 
classes  of  all  the  instances  a±...  an,  6X  .  .  .  bm,  etc.,  regarded  as 
one  set.1 

10.  This  completes  our  classification  of  the  positive  evidence 
which  supports  a  generalisation  ;  but  the  probability  may  also 
be  affected  by  a  consideration  of  the  negative  evidence.  We 
have  taken  account  so  far  of  that  part  of  the  evidence  only  which 
shows  the  whole  or  part  of  the  analogy  we  require,  and  we  have 
neglected  those  instances  of  which  (/>,  the  condition  of  the  general 
isation,  or/,  its  conclusion,  or  part  of  $  or  of  /  is  known  to  be  false. 
Suppose  that  there  are  instances  of  which  </>  is  true  and  /false,  it 
is  clear  that  the  generalisation  is  ruined.  But  cases  in  which  we 
know  part  of  (/>  to  be  true  and/  to  be  false,  and  are  ignorant  as 
to  the  truth  or  falsity  of  the  rest  of  c/>,  weaken  it  to  some  extent. 
We  must  take  account,  therefore,  of  analogies 


where  „,(/>,  part  of  (/>,  is  true  of  all  the  set,  and  „.,/,  part  of  /  is 
false  of  all  the  set,  while  the  truth  or  falsity  of  some  part  of  </>  and 
/  is  unknown.  The  negative  evidence,  however,  can  strengthen 
as  well  as  weaken  the  evidence.  We  deem  instances  favourably 
relevant  in  which  </>  and  /are  both  false  together.2 

Our  final  formula,  therefore,  must  include  terms,  similar  to 
those  in  the  formula  which  concludes  §  9,  not  only  for  sets  of 
instances  which  show  analogies  u<f>J,  where  a<j>  and  J  are  parts 
of  (/>  and  /,  but  also  for  sets  which  show  analogies  (t</>((/, 

1  Even  if  wo  want  to  distinguish  between  the  sub-analogies  of  the  a  set  and 
the  sub-analogies  of  the  b  set,  this  information  can  be  gathered  from  the  pro- 

"C2  I  am  disposed  to  think  that  we  need  not  pay  attention  to  instances  for 
which  part  of  0  is  known  to  be  false,  and  part  of  /  to  be  true.  But  the 
question  is  a  little  perplexing. 
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or  analogies  „<£„/,  where  „</>  and  „/  are  the  whole  or  part  of  </> 
and/,  and  (/>/  are  the  contradictories  of  <f>  and/.1 

It  should  be  added,  perhaps,  that  the  theoretical  classifica 
tion  of  most  empirical  arguments  in  daily  use  is  complicated  by 
the  account  which  we  reasonably  take  of  generalisations  previ 
ously  established.  We  often  take  account  indirectly,  therefore, 
of  evidence  which  supports  in  some  degree  other  generalisations 
than  that  which  we  are  concerned  to  establish  or  refute  at  the 
moment,  but  the  probability  of  which  is  relevant  to  the  problem 
under  investigation. 

11.  The  argument  will  be  rendered  unnecessarily  complex, 
without  much  benefit  to  its  theoretical  interest,  if  we  deal  with 
the  most  general  case  of  all.  What  follows,  therefore,  will  deal 
with  the  formula  of  the  third  degree  of  generality,  namely— 

</($./)/    A    (WJ)    \     (</>')  I  I  /    A    (^;,)1. 

/    '!,...«„  «,...«„  (  ",-...«,  j 

in  which  no  partial  instances  occur,  i.e.  no  instances  in  which  part 
only  of  the  analogy,  required  by  the  generalisation,  is  known  to 
exist.  In  this  third  degree  of  generality,  it  will  be  remembeivd, 
our  knowledge  of  the  characteristics  of  the  instances  is  in 
complete,  there  is  more  analogy  between  the  instances  than  is 
covered  by  the  generalisation,  and  there  are  some  sub-analogies 
to  be  reckoned  with.  In  the  above  formula  the  incompleteness 
of  our  knowledge  is  implicitly  recognised  in  that  </></>!/</>'  are 
not  between  them  entirely  comprehensive.  It  is  also  supposed 
that  all  the  evidence  we  have  is  positive,  no  knowledge  is 
assumed,  that  is  to  say,  of  instances  characterised  by  the  con 
junctions  „(/>„/  „<£„/  or  „</>„/,  where  i(<£  and  ,,/are  part  of  (/>  and/. 

An  argument,  therefore,  from  experience,  in  which,  on  tin- 
basis  of  examined  instances,  we  establish  a  generalisation  applic 
able  beyond  these,  instances,  can  be  strengthened, if  we  restrict  < mi- 
attention  to  the  simpler  type  of  case,  by  th«'  following  means  : 

(1)  By  reducing  the  resemblances  c/jj  known  to  be  common  to 
all  the  instances,  but  ignored  as  unessential  by  the  generalisation. 

('!}  Bv  increasing  the  differences  <//  known  to  exist  between 
the  instances. 

1  Whrro  the  conoliiHion/ i«  simple  and  not  complex  (HIV  §  5),  some  <>f  thc.ie 
complications  cannot,  of  course,  urine. 
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(3)  By  diminishing  the  sub-analogies  or  unessential  resem 
blances  -v^/.  known  to  be  common  to  some  of  the  instances  and  not 
known  to  be  false  of  any. 

These  results  can  generally  be  obtained  in  two  ways,  either  by 
increasing  the  number  of  our  instances  or  by  increasing  our  know 
ledge  of  those  we  have. 

The  reasons  why  these  methods  seem  to  common  sense  to 
strengthen  the  argument  are  fairly  obvious.  The  object  of  (1)  is  to 
avoid  the  possibility  that  fa  as  well  as  (f>  is  a  necessary  condition 
of/.  The  object  of  (2)  is  to  avoid  the  possibility  that  there  may 
be  some  resemblances  additional  to  </>,  common  to  all  the  instances, 
which  have  escaped  our  notice.  The  object  of  (3)  is  to  get  rid 
of  indications  that  the  total  value  of  fa  may  be  greater  than  the 
known  value.  When  <j>faf  is  the  total  positive  analogy  between 
the  instances,  so  that  the  known  value  of  fa  is  its  total  value,  it 
is  (1)  which  is  fundamental ;  and  we  need  take  account  of  (2) 
and  (3)  only  when  our  knowledge  of  the  instances  is  incomplete. 
But  when  our  knowledge  of  the  instances  is  incomplete,  so  that 
fa  falls  short  of  its  total  value  and  we  cannot  infer  $  from  it, 
it  is  better  to  regard  (2)  as  fundamental ;  in  any  case  every 
reduction  of  fa  must  increase  $'. 

12.  I  have  now  attempted  to  analyse  the  various  ways  in 
which  common  practice  seems  to  assume  that  considerations 
of  Analogy  can  yield  us  presumptive  evidence  in  favour  of  a 
generalisation. 

It  has  been  my  object,  in  making  a  classification  of  empirical 
arguments,  not  so  much  to  put  my  results  in  forms  closely  similar 
to  those  in  which  problems  of  generalisation  commonly  present 
themselves  to  scientific  investigators,  as  to  inquire  whether 
ultimate  uniformities  of  method  can  be  found  beneath  the 
innumerable  modes,  superficially  differing  from  another,  in 
which  we  do  in  fact  argue. 

I  have  not  yet  attempted  to  justify  this  way  of  arguing. 
After  turning  aside  to  discuss  in  more  detail  the  method  of  Pure 
Induction,  I  shall  make  this  attempt ;  or  rather  I  shall  try  to  see 
what  sort  of  assumptions  are  capable  of  justifying  empirical 
reasoning  of  this  kind. 


CHAPTER    XX 

THE    VALUE    OF    MULTIPLICATION    OF   INSTANCES,    OR    PIT  RE 
INDUCTION 

1.  IT  has  often  been  thought  that  the  essence  of  inductive  argu 
ment  lies  in  the  multiplication  of  instances.  '*  Where  is  that 
process  of  reasoning,"  Hume  inquired,  "  which  from  one  instance 
draws  a  conclusion,  so  different  from  that  which  it  infers  from 
a  hundred  instances,  that  are  no  way  different  from  that  single 
instance  ?  "  I  repeat  that  by  emphasising  the  number  of  the  in 
stances  Hume  obscured  the  real  object  of  the  method.  If  it 
were  strictly  true  that  the  hundred  instances  are  no  way  different 
from  the  single  instance,  Hume  would  be  right  to  wonder  in  what 
manner  they  can  strengthen  the  argument.  The  object  of  in 
creasing  the  number  of  instances  arises  out  of  the  fact  that  we 
are  nearly  always  aware  of  some  difference  between  the  instances, 
and  that  even  where  the  known  difference  is  insignificant  we  may 
suspect,  especially  when  our  knowledge  of  the  instances  is  very 
incomplete,  that  there  may  be  more.  Every  new  instance  may 
diminish  the  unessential  resemblances  between  the  instances  and 
by  introducing  a  new  difference  increase  the  Negative  Analogy. 
For  this  reason,  and  for  this  reason  only,  new  instances  are 
valuable. 

If  our  premisses  comprise  the  body  of  memory  and  tradition 
which  has  been  originally  derived  from  direct  experience,  and 
the  conclusion  which  we  seek  to  establish  is  the  Newtonian  theory 
of  the  Solar  System,  our  argument  is  one  of  Pure  Induction,  iii 
so  far  as  we  support  the  Newtonian  theory  by  pointing  to  the 
great  number  of  consequences  which  it  has  in  common  with  the 
facts  of  experience.  The  predictions  of  the  Nautical  Almanack 
arc  a  consequence  of  the  Newtonian  theory,  and  these  predictions 
are  verified  many  thousand  times  a  day.  But  even  here  the 
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force  of  the  argument  largely  depends,  not  on  the  mere  number 
of  these  predictions,  but  on  the  knowledge  that  the  circumstances 
in  which  they  are  fulfilled  differ  widely  from  one  another  in  a 
vast  number  of  important  respects.  The  variety  of  the  circum 
stances,  in  which  the  Newtonian  generalisation  is  fulfilled,  rather 
than  the  number  of  them,  is  what  seems  to  impress  our  reasonable 
faculties. 

2.  I  hold,  then,  that  our  object  is  always  to  increase  the 
Negative  Analogy,  or,  which  is  the  same  thing,  to  diminish  the 
characteristics  common  to  all  the  examined  instances  and  yet  not 
taken  account  of  by  our  generalisation.  Our  method,  however, 
may  be  one  which  certainly  achieves  this  object,  or  it  may  be  one 
which  possibly  achieves  it.  The  former  of  these,  which  is  obvi 
ously  the  more  satisfactory,  may  consist  either  in  increasing  our 
definite  knowledge  respecting  instances  examined  already,  or  in 
finding  additional  instances  respecting  which  definite  knowledge 
is  obtainable.  The  second  of  them  consists  in  finding  additional 
instances  of  the  generalisation,  about  which,  however,  our  de 
finite  knowledge  may  be  meagre  ;  such  further  instances,  if  our 
knowledge  about  them  were  more  complete,  would  either  increase 
or  leave  unchanged  the  Negative  Analogy  ;  in  the  former  case 
they  would  strengthen  the  argument  and  in  the  latter  case  they 
would  not  weaken  it ;  and  they  must,  therefore,  be  allowed  some 
weight.  The  two  methods  are  not  entirely  distinct,  because 
new  instances,  about  which  we  have  some  knowledge  but  not 
much,  may  be  known  to  increase  the  Negative  Analogy  a  little 
by  the  first  method,  and  suspected  of  increasing  it  further  by  the 
second. 

It  is  characteristic  of  advanced  scientific  method  to  depend 
on  the  former,  and  of  the  crude  unregulated  induction  of  ordinary 
experience  to  depend  on  the  latter.  It  is  when  our  definite 
knowledge  about  the  instances  is  limited,  that  we  must  pay 
attention  to  their  number  rather  than  to  the  specific  differences 
between  them,  and  must  fall  back  on  what  I  term  Pure  Induction. 

In  this  chapter  I  investigate  the  conditions  and  the  manner 
in  which  the  mere  repetition  of  instances  can  add  to  the  force 
of  the  argument.  The  chief  value  of  the  chapter,  in  my  judg 
ment,  is  negative,  and  consists  in  showing  that  a  line  of  advance, 
which  might  have  seemed  promising,  turns  out  to  be  a  blind 
alley,  and  that  we  are  thrown  back  on  known  Analogy.  Pure 
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Induction  will  not  give  us  any  very  substantial  assistance  in 
getting  to  the  bottom  of  the  general  inductive  problem. 

3.  The  problem  of  generalisation  *  by  Pure  Induction  can  be 
stated  in  the  following  symbolic  form  : 

Let  h  represent  the  general  a  priori  data  of  the  investigation  ; 
let  y  represent  the  generalisation  which  we  seek  to  establish  ; 
let  x^x2  .  .  .  xn  represent  instances  of  g. 

Then  xjgh^l,  x.Jgh  =  1  .  .  .  xjgk  =  1  ;  given  </,  that  is  to 
say,  the  truth  of  each  of  its  instances  follows.  The  problem  is 
to  determine  the  probability  g/hx^  .  .  .  xn,  i.e.  the  probability 
of  the  generalisation  when  n  instances  of  it  are  given.  Our 
analysis  will  be  simplified,  and  nothing  of  fundamental  importance 
will  be  lost,  if  we  introduce  the  assumption  that  there  is  nothing 
in  our  a  priori  data  which  leads  us  to  distinguish  between  the 
a  priori  likelihood  of  the  different  instances  ;  we  assume,  that  is 
to  say,  that  there  is  no  reason  a  priori  for  expecting  the  occurrence 
of  any  one  instance  with  greater  reliance  than  any  other,  i.e. 

X  III  —  X  Ik  —  —  X  Ik . 

Write  ////"i''2  -  •  -  ••'•„  =}>„ 

and  J'n  ,  \jll-1' \'''. 2.  •    -   •  '''n        lln  •  1  ' 

then 

P,,  .'///' .';1 '•„  ^./^''l   •   •   'Xn-\ 

f'"''l  •   .    ••'•„-! 
1 


•    l>n    =-  ,  and  hence  //„  =-  .  p^   where  />„••--/////,   ?r-  /'«• 

"/'„  i   //„  n\yz-  •  •  //». 

is  the  a  priori  probability  of  the  generalisation. 

1  In  tin-  most  general  sense  wo  can  regard  any  proposition  as  the  generalisa 
tion  of  ,-ill  the  propositions  which  follow  from  it.  For  if  ft  is  any  proposition, 
and  we  |  >ut  0(x)  •-'/•mil  be  inferred  from  h  '  and/(.r)  j,  then  </(</>,  /)  /'•  Sinn- 
1'ure  Induction  consists  in  finding  as  many  instances  of  a  generalisation  as 
jKjssil.l.-,  it  i«,  in  the  widest  Ken*e,  the  process  of  strengthening  the  probability 
of  any  proportion  by  adducing  numerous  instances  of  known  truths  which 
follow  from  it.  The  argument  is  one  of  Pure  Induction,  then-fon-,  in  HO  far  a« 
the  probability  of  a  conclusion  is  based  upon  the  numl»er  of  inde|>endent  con- 
sc(|iicii(  cs  which  tip-  coiulusion  and  the  premisncH  have  in  common. 
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It  follows,  therefore,  that  pn>pu_l  so  long  as  yu*\. 
Further, 


_  Po  _ 
.  .  .  se,t///A  +  a^a-g 


Po 


This    approaches    unity    as    a    limit,    if    x-p%  •  •  -x  Ij/h. 

^0 

approaches  zero  as  a  limit,  when  n  increases. 

4.  We  may  now  stop  to  consider  how  much  this  argument  has 
proved.     We  have  shown  that  if  each  of  the  instances  necessarily 
follows  from  the  generalisation,  then  each  additional  instance 
increases  the  probability  of  the  generalisation,  so  long  as  the  new 
instance  could  not  have  been  predicted  with  certainty  from  a 
knowledge  of  the  former  instances.1     This  condition  is  the  same 
as  that  which  came  to  light  when  we  were  discussing  Analogy. 
If  the  new  instance  were  identical  with  one  of  the  former  in 
stances,  a  knowledge  of  the  latter  would  enable  us  to  predict  it. 
If  it  differs  or  may  differ  in  analogy,  then  the  condition  required 
above  is  satisfied. 

The  common  notion,  that  each  successive  verification  of  a 
doubtful  principle  strengthens  it,  is  formally  proved,  therefore, 
without  any  appeal  to  conceptions  of  law  or  of  causality.  But 
we  have  not  proved  that  this  probability  approaches  certainty  as 
a  limit,  or  even  that  our  conclusion  becomes  more  likely  than  not, 
as  the  number  of  verifications  or  instances  is  indefinitely  increased. 

5.  What  are  the  conditions  which  must  be  satisfied  in  order 
that  the  rate,  at  which  the  probability  of  the  generalisation 
increases,  may   be   such   that  it  will   approach   certainty  as  a 

1  Since  pn~'pn  i  so  long  as  yn  -*F\. 
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limit  when  the  number  of  independent  instances  of  it  are  in 
definitely  increased  ?  We  have  already  shown,  as  a  basis  for 
this  investigation,  that  pn  approaches  the  limit  of  certainty  for 
a  generalisation  g,  if,  as  n  increases,  x^2  .  .  .  xjf/k  becomes 
small  compared  with  ;>„.  i.e.  if  the  d  priori  probability  of  so  many 
instances,  assuming  the  falsehood  of  the  generalisation,  is  small 
compared  with  the  generalisation's  d  priori  probability.  It 
follows,  therefore,  that  the  probability  of  an  induction  tends 
towards  certainty  as  a  limit,  when  the  number  of  instances  is 
increased,  provided  that 


for  all  values  of  r,  and  />„>/;,  where  e  and  7;  arc  finite  proba 
bilities,  separated,  that  is  to  say,  from  impossibility  by  a  value 
of  some  finite  amount,  however  small.  These  conditions  appear 
simple,  but  the  meaning  of  a  '  finite  probability  '  requires  a 
word  of  explanation.1 

I  argued  in  Chapter  ILL  that  not  all  probabilities  have  an 
exact  numerical  value,  and  that,  in  the  case  of  some,  one  can  say 
no  more  about  their  relation  to  certainty  and  impossibility  than 
that  they  fall  short  of  the  former  and  exceed  the  latter.  Then? 
is  one  class  of  probabilities,  however,  which  I  called  the  numerical 
class,  the  ratio  of  each  of  whose  members  to  certainty  can  be 
expressed  by  some  number  less  than  unity  ;  and  we  can  sometimes 
compare  a  non-numerical  probability  in  respect  of  more  and  less 
with  one  of  these  numerical  probabilities.  This  enables  us  to 
give  a  definition  of  '  finite  probability  '  which  is  capable  of  applica 
tion  to  non-numerical  as  well  as  to  numerical  probabilities.  I 
define  a  '  finite  probability  '  as  one  which  exceeds  some  numerical 
probability,  the  ratio  of  which  to  certainty  can  be  expressed  by 
a  finite  number.2  The  principal  method,  in  which  a  probability 
can  be  proved  finite  by  a  process  of  argument,  arises  either  when 

1  The  proof  of  those  conditions,  which  is  obvious,  is  as  follows  : 

2-jj-  ,  .  .  .  .r,,  Igh  -  r,,  /*,*._.  .  .  .  r,,   ,<//*  .  f\r.,  .  .  .  fn  ,  <jh  •   (  \      <  )", 
where  r    is   finite    and  p,,^ii  when;  ij  is  finite*.     There    is  always,  under  these 

conditions,  some   finite   value  of   n  such   that   both  (1      »  )"  and  are  leaa 

than  any  given  finite  quantity,  however  small. 

2  Hence  u  series  of  probabilities  p\p  .....  pr  approaches  a  limit  L,  if,  given 
any  positive  finite  numixT  t  however  Hinall,  a  positive  integer  n  can  always  be 
found  such  that  for  all  values  of  r  greater  than  n  the  difference*  between  L  and  ]ir 
is  less  than  f.y,  where  7  is  the  measure  of  certainty. 
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its  conclusion  can  be  shown  to  be  one  of  a  finite  number  of  alter 
natives,  which  are  between  them  exhaustive  or,  at  any  rate,  have 
a  finite  probability,  and  to  which  the  Principle  of  Indifference 
is  applicable  ;  or  (more  usually),  when  its  conclusion  is  more 
probable  than  some  hypothesis  which  satisfies  this  first  condition. 

6.  The  conditions,  which  we  have  now  established  in  order 
that  the   probability  of   a   pure   induction   may  tend   towards 
certainty  as  the  number  of  instances  is  increased,  are  (1)  that 
xrjx^xz. .  .xr_-^Ji  falls    short    of    certainty  by  a  finite  amount 
for  all  values  of  r,  and  (2)  that  pQ,  the  d  priori  probability  of  our 
generalisation,  exceeds  impossibility  by  a  finite  amount.     It  is 
easy  to  see  that  we  can  show  by  an  exactly  similar  argument  that 
the  following  more  general  conditions  are  equally  satisfactory  : 

(1)  That  Xy/x^  .  .  .  xr_lf/h  falls  short  of  certainty  by  a  finite 
amount  for  all  values  of  r  beyond  a  specified  value  s. 

(2)  That  ps,  the  probability  of  the  generalisation  relative  to 
a  knowledge  of  these  first  s  instances,  exceeds  impossibility  by 
a  finite  amount. 

In  other  words  Pure  Induction  can  be  usefully  employed  to 
strengthen  an  argument  if,  after  a  certain  number  of  instances 
have  been  examined,  we  have,  from  some  other  source,  a  finite 
probability  in  favour  of  the  generalisation,  and,  assuming  the 
generalisation  is  false,  a  finite  uncertainty  as  to  its  conclusion 
being  satisfied  by  the  next  hitherto  unexamined  instance  which 
satisfies  its  premiss.  To  take  an  example,  Pure  Induction  can 
be  used  to  support  the  generalisation  that  the  sun  will  rise  every 
morning  for  the  next  million  years,  provided  that  with  the  ex 
perience  we  have  actually  had  there  are  finite  probabilities, 
however  small,  derived  from  some  other  source,  first,  in  favour  of 
the  generalisation,  and,  second,  in  favour  of  the  sun's  not  rising 
to-morrow  assuming  the  generalisation  to  be  false.  Given  these 
finite  probabilities,  obtained  otherwise,  however  small,  then  the 
probability  can  be  strengthened  and  can  tend  to  increase  towards 
certainty  by  the  mere  multiplication  of  instances  provided 
that  these  instances  are  so  far  distinct  that  they  are  not 
inferrible  one  from  another. 

7.  Those  supposed  proofs  of  the  Inductive  Principle,  which 
are  based  openly  or  implicitly  on  an  argument  in  inverse  prob 
ability,   are  all   vitiated    by  unjustifiable  assumptions  relating 
to  the  magnitude  of  the  d  priori  probability  pQ.     Jevons,  for 


OH.  xx  INDUCTION  AND  ANALOGY  239 

instance,  avowedly  assumes  that  we  may,  in  the  absence  of  special 
information,  suppose  any  unexamined  hypothesis  to  be  as  likelv 
as  not.  It  is  difficult,  to  see  how  such  a  belief,  if  even  its  most 
immediate  implications  had  been  properly  apprehended,  could 
have  remained  plausible  to  a  mind  of  so  sound  a  practical  judg 
ment  as  his.  The  arguments  against  it  and  the  contradictions 
to  which  it  leads  have,  been  dealt  with  in  Chapter  IV.  The 
demonstration  of  Laplace,  which  depends  upon  the  Rule  of 
Succession,  will  be  discussed  in  Chapter  XXX. 

8.  The  prior  probability,  which  must  always  be  found,  before 
the  method  of  pun;  induction  can  be  usefully  employed  to  support 
a  substantial  argument,  is  derived,  I  think,  in  most  ordinary 
cases-  with  what  justification  it  remains  to  discuss  from  con 
siderations  of  Analogy.  But  the  conditions  of  valid  induction 
as  they  have  been  enunciated  above,  are  quite  independent  of 
analogy,  and  might  be  applicable  to  other  types  of  argument. 
In  certain  cases  we  might  feel  justified  in  assuming  directly  that 
the  necessary  conditions  are  satisfied. 

Our  belief,  for  instance,  in  the  validity  of  a  logical  scheme  is 
based  partly  upon  inductive  grounds  on  the  number  of  conclu 
sions,  each  seemingly  true  on  its  own  account,  which  can  be 
derived  from  the  axioms  and  partly  on  a  degree  of  self-evidence 
in  the  axioms  themselves  sufficient  to  give  them  the  initial 
probability  upon  which  induction  can  build.  AVe  depend  upon 
the  initial  presumption  that,  if  a  proposition  appears  to  us  to 
be  true,  this  is  by  itself,  in  the  absence  of  opposing  evidence, 
some  reason  for  its  being  as  well  as  appearing  true.  \Ve  cannot 
deny  that  what  appears  true  is  sometimes  false,  but,  unless  we 
can  assume  some  substantial  relation  of  probability  between 
the  appearance  and  the  reality  of  truth,  the  possibility  of 
even  probable  knowledge  is  at  an  end. 

The  conception  of  our  having  some  reason,  though  not  a 
conclusive  one,  for  certain  beliefs,  arising  out  of  direct  inspection, 
may  prove  important  to  the  theory, of  epistemology.  The  old 
metaphysics  has  been  greatly  hindered  by  reason  of  its  having 
always  demanded  demonstrative  certainty.  Much  of  the  cogency 
of  Hume's  criticism  arises  out  of  the  assumption  of  methods 
of  certainty  on  the  part  of  those  systems  against  which  it  was 
directed.  The  earlier  realists  were  hampered  by  their  not  per 
ceiving  that  lesser  claims  in  the  beginning  might  yield  them 
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what  they  wanted  in  the  end.  And  transcendental  philosophy 
has  partly  arisen,  I  believe,  through  the  belief  that  there  is  no 
knowledge  on  these  matters  short  of  certain  knowledge,  being 
combined  with  the  belief  that  such  certain  knowledge  of  meta 
physical  questions  is  beyond  the  power  of  ordinary  methods. 

When  we  allow  that  probable  knowledge  is,  nevertheless,  real, 
a  new  method  of  argument  can  be  introduced  into  metaphysical 
discussions.  The  demonstrative  method  can  be  laid  on  one  side, 
and  we  may  attempt  to  advance  the  argument  by  taking  account 
of  circumstances  which  seem  to  give  some  reason  for  preferring 
one  alternative  to  another.  Great  progress  may  follow  if  the 
nature  and  reality  of  objects  of  perception,1  for  instance,  can  be 
usefully  investigated  by  methods  not  altogether  dissimilar  from 
those  employed  in  science  and  with  the  prospect  of  obtaining  as 
high  a  degree  of  certainty  as  that  which  belongs  to  some  scientific 
conclusions  ;  and  it  may  conceivably  be  shown  that  a  belief  in 
the  conclusions  of  science,  enunciated  in  any  reasonable  manner 
however  restricted,  involves  a  preference  for  some  metaphysical 
conclusions  over  others. 

9.  Apart  from  analysis,  careful  reflection  would  hardly  lead 
us  to  expect  that  a  conclusion  which  is  based  on  no  other  than 
grounds  of  pure  induction,  defined  as  I  have  defined  them  as 
consisting  of  repetition  of  instances  merely,  could  attain  in  this 
way  to  a  high  degree  of  probability.  To  this  extent  we  ought 
all  of  us  to  agree  with  Hume.  We  have  found  that  the  sugges 
tions  of  common  sense  are  supported  by  more  precise  methods. 
Moreover,  we  constantly  distinguish  between  arguments,  which 
we  call  inductive,  upon  other  grounds  than  the  number  of  in 
stances  upon  which  they  are  based  ;  and  under  certain  conditions 
we  regard  as  crucial  an  insignificant  number  of  experiments.  The 
method  of  pure  induction  may  be  a  useful  means  of  strengthening 
a  probability  based  on  some  other  ground.  In  the  case,  however, 
of  most  scientific  arguments,  which  would  commonly  be  called 
inductive,  the  probability  that  we  are  right,  when  we  make 
predictions  on  the  basis  of  past  experience,  depends  not  so 
much  on  the  number  of  past  experiences  upon  which  we  rely, 
as  on  the  degree  in  which  the  circumstances  of  these  experiences 

1  A  paper  by  Mr.  G.  E.  .Moore  entitled,  "  The  Nature  and  Reality  of  Objects 
of  Perception,"  which  was  published  in  the  Proceedings  of  the  Aristotelian  Society 
for  J90(l,  seems  to  me  to  apply  for  the  first  time  a  method  somewhat  resembling 
that  which  is  described  above, 
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resemble  the  known  circumstances  in  which  the  prediction  is 
to  take  effect.  Scientific  method,  indeed,  is  mainly  devoted  to 
discovering  means  of  so  heightening  the  known  analogy  that 
we  may  dispense  as  far  as  possible  with  the  methods  of  pure 
induction. 

\\hen,  therefore,  our  previous  knowledge  is  considerable 
and  the  analogy  is  good,  the  purely  inductive  part  of  the  argu 
ment  may  take  a  very  subsidiary  place.  But  when  our  knowledge 
of  the  instances  is  slight,  we  may  have  to  depend  upon  pure 
induction  a  good  deal.  In  an  advanced  science  it  is  a  last  resort, 
— the  least  satisfactory  of  the  methods.  But  sometimes  it  must 
be  our  iirst  resort,  the  method  upon  which  we  must  depend  in 
the  dawn  of  knowledge  and  in  fundamental  inquiries  where 
we  must  presuppose  nothing. 


CHAPTEE    XXI 

THE   NATURE   OF   INDUCTIVE   ARGUMENT    CONTINUED 

1.  IN  the  enunciation,  given  in  the  two  preceding  chapters,  of  the 
Principles  of  Analogy  and  Pure  Induction  there  has  been  no 
reference  to  experience  or  causality  or  law.  So  far,  the  argument 
has  been  perfectly  formal  and  might  relate  to  a  set  of  proposi 
tions  of  any  type.  But  these  methods  are  most  commonly 
employed  in  physical  arguments  where  material  objects  or 
experiences  are  the  terms  of  the  generalisation.  We  must  con 
sider,  therefore,  whether  there  is  any  good  ground,  as  some 
logicians  seem  to  have  supposed,  for  restricting  them  to  this 
kind  of  inquiry. 

I  am  inclined  to  think  that,  whether  reasonably  or  not.  we 
naturally  apply  them  to  all  kinds  of  argument  alike,  including 
formal  arguments  as,  for  example,  about  numbers.  When  we 
are  told  that  Ferrnat's  formula  for  a  prime,  namely,  22<x  + 1  for 
all  values  of  a,  has  been  verified  in  every  case  in  which  veri 
fication  is  not  excessively  laborious — namely,  for  a  =  l,  2,  3, 
and  4,  we  feel  that  this  is  some  reason  for  accepting  it,  or,  at 
least,  that  it  raises  a  sufficient  presumption  to  justify  a 
further  examination  of  the  formula.1  Yet  there  can  be  no  refer 
ence  here  to  the  uniformity  of  nature  or  physical  causation.  If 
inductive  methods  are  limited  to  natural  objects,  there  can  no 
more  be  an  appreciable  ground  for  thinking  that  22a  + 1  is  a  true 
formula  for  primes,  because  empirical  methods  show  that  it 
yields  primes  up  to  a  =  4,  or  even  if  they  shoived  that  it  yielded 
primes  for  every  number  up  to  a  million  million,  than  there  is 
to  think  that  any  formula  which  I  may  choose  to  write  down 

1  This  formula  h<i«,  in  fact,  been  disproved  in  recent  times,  e.g.  2'-5  + 1  = 
4,  294,  !*07,  207  =  041  x  0,  TOO,  417.  Thus  it  is  no  longer  so  good  an  illustration 
as  it  would  have  been  a  hund-ed  vcars  ago. 

242 
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at  random  is  a  true  source  of  primes.  To  maintain  that  there  is 
no  appreciable  ground  in  such  a  case  is  paradoxical.  If,  on  the 
other  hand,  a  partial  verification  does  raise  some  just  appreciable 
presumption  in  the  formula's  favour,  then  we  must  include 
numbers,  at  any  rate,  as  well  as  material  objects  amongst  the 
proper  subjects  of  the  inductive  method.  The  conclusion  of 
the  previous  chapter  indicates,  however,  that,  if  arguments  of 
this  kind  have  force,  it  can  only  be  in  virtue  of  there  being 
some  finite  d  priori  probability  for  the  formula  based  on  other 
than  inductive  grounds. 

There  are  some  illustrations  in  Jevons's  Principles  of  Science,1 
which  are  relevant  to  this  discussion.  We  find  it  to  be  true  of 
the  following  six  numbers  : 

5,  15,  35,  45,  65,  95 

that  they  all  end  in  five,  and  arc  all  divisible  by  five  without  re 
mainder.  Would  this  fact,  by  itself,  raise  any  kind  of  presump 
tion  that  all  numbers  ending  in  five  are  divisible  by  five  without 
remainder  ?  Let  us  also  consider  the  six  numbers, 

7,  17,  .^7,  17,  (57,  97. 

They  all  end  in  seven  and  also  agree  in  being  primes.  Would 
this  raise  a  presumption  in  favour  of  the  generalisation  that  all 
numbers  are  prime,  which  end  in  seven  ?  We  might  be  prejudiced 
in  favour  of  the  first  argument,  because  it  would  lead  us  to  a 
true  conclusion  ;  but  we  ought  not  to  be  prejudiced  against  the 
second  because  it  would  lead  us  to  a  false  one  ;  for  the  validity 
of  empirical  arguments  as  the  foundation  of  a  probability  cannot 
be  affected  by  the  actual  truth  or  falsity  of  their  conclusions. 
If,  on  the  evidence,  the  analogy  is  similar  and  equal,  and  if  the 
scope  of  the  generalisation  and  its  conclusion  is  similar,  then  the 
value  of  the  two  arguments  must  be  equal  also. 

Whether  ornot  theuseof  empirical  argument  appears  plausible 
to  us  in  these  particular  examples,  it  is  certainly  true  that  many 
mathematical  theorems  have  actually  been  discovered  by  such 
methods.  Generalisations  have  been  suggested  nearly  as  often, 
perhaps,  in  the  logical  and  mathematical  sciences,  as  in  the 

1  i']>.  229-231  (on*-  volume*  edition).  .Jevoris  UHCS  these  illustrations,  not 
for  the  piir|K>8o  to  which  I  um  hero  putting  them,  but  to  demonstrate  tho  fulli. 
bility  of  empirical  lawn. 
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physical,  by  the  recognition  of  particular  instances,  even  where 
formal  proof  has  been  forthcoming  subsequently.  Yet  if  the 
suggestions  of  analogy  have  no  appreciable  probability  in  the 
formal  sciences,  aiid  should  be  permitted  only  in  the  material,  it 
must  be  unreasonable  for  us  to  pursue  them.  If  no  finite  prob 
ability  exists  that  a  formula,  for  which  we  have  empirical  verifica 
tion,  is  in  fact  universally  true,  Newton  was  acting  fortunately, 
but  not  reasonably,  when  he  hit  on  the  Binomial  Theorem  by 
methods  of  empiricism.1 

2.  I  am  inclined  to  believe,  therefore,  that,  if  we  trust  the 
promptings  of  common  sense,  we  have  the  same  kind  of  ground 
for  trusting  analogy  in  mathematics  that  we  have  in  physics, 
and  that  we  ought  to  be  able  to  apply  any  justification  of  the 
method,  which  suits  the  latter  case,  to  the  former  also.     This 
does  not  mean  that  the  d  priori  probabilities,  from  some  other 
source  than  induction,  which  the  inductive  method  requires  as 
its  foundation,  may  not  be  sought  and  found  differently  in  the 
two    types  of   inquiry.      A  reason  why  it   has    been  thought 
that  analogy   ought  to  be  confined  to  natural  laws  may  be, 
perhaps,    that    in    most    of    those    cases,  in    which   we   could 
support  a  mathematical  theorem  by  a  very  strong  analogy,  the 
existence  of  a  formal  proof  has  done  away  with  the  necessity 
for  the  limping  methods   of  empiricism  ;    and  because  in  most 
mathematical   investigations,    while    in    our    earliest    thoughts 
we  are  not  ashamed  to  consult  analogy,  our  later  work  will  be 
more  profitably  spent  in  searching  for  a  formal  proof  than  in 
establishing  analogies  which  must,  at  the  best,  be  relatively  weak. 
As  the  modern  scientist  discards,  as  a  rule,  the  method  of  pure 
induction,    in    favour   of    experimental    analogy,    where,   if   he 
takes  account  of  his  previous  knowledge,  one  or  two  cases  may 
prove   immensely   significant ;     so   the   modern   mathematician 
prefers   the   resources    of   his    analysis,   which   may  yield  him 
certainty,  to  the  doubtful  promises  of  empiricism. 

3.  The  main  reason,  however,  why  it  has  often  been  held  that 
we  ought  to  limit  inductive  methods  to  the  content  of  the  particu 
lar  material  universe  in  which  we  live,  is,  most  probably,  the 
fact  that  we  can  easily  imagine  a  universe  so  constructed  that 
such  methods  would  be  useless.     This  suggests  that  analogy  and 
induction,  while  they  happen  to  be  useful  to  us  in  this  world, 

J  Sec  Jevons,  loc.  cit.  p.  231. 
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cannot  be  universal  principles  of  logic,  on  the  same  footing,  for 
instance,  as  the  syllogism. 

In  one  sense  this  opinion  may  be  well  founded.  I  do  not  deny 
or  affirm  at  present  that  it  may  be  necessary  to  confine  inductive 
methods  to  arguments  about  certain  kinds  of  objects  or  certain 
kinds  of  experiences.  It  may  be  true  that  in  every  useful  argu 
ment  from  analogy  our  premisses  must  contain  fundamental 
assumptions,  obtained  directly  and  not  inductively,  which  some 
possible  experiences  might  preclude.  Moreover,  the  success  of 
induction  in  the  past  can  certainly  affect  its  probable  usefulness 
for  th«>  future.  We  may  discover  something  about  the  nature 
of  the  universe  we  may  even  discover  it  by  means  of  induction 
itself  the  knowledge  of  which  has  the  effect  of  destroying  the 
further  utility  of  induction.  I  shall  argue  later  on  that  the 
confidence  with  which  we  ourselves  use  the  method  does  in 
fact  depend  upon  the  nature  of  our  past  experience. 

But  this  empirical  attitude  towards  induction  may,  on  the 
other  hand,  arise  out  of  either  one  of  two  possible  Confusions. 
It  may  confuse,  first,  the  reasonable  character  of  arguments 
with  their  practical  usefulness.  The  usefulness  of  induction 
depends,  no  doubt,  upon  the  actual  content  of  experience.  If 
there  were  no  repetition  of  detail  in  the  universe,  induction 
would  have  no  utility.  If  there  were  only  a  single  object  in  the 
universe,  the  laws  of  addition  would  have  no  utilitv.  But  the 
processes  of  induction  and  addition  would  remain  reasonable. 
It  may  confuse,  secondly,  the  validity  of  attributing  probability 
to  tin;  conclusion  of  an  argument  with  the  question  of  the  actual 
truth  of  the  conclusion.  Induction  tells  us  that,  on  the  basis  of 
certain  evidence,  a  certain  conclusion  is  reasonable,  not  that  it  is 
true.  If  the  sun  does  not  rise,  to-morrow,  if  Queen  Anne  still 
lives,  this  will  not  prove  that  it  was  foolish  or  unreasonable  of  us 
to  have  believed  the  contrary. 

4.  It  will  be  worth  while  to  say  a  little  more  in  this  connection 
about  the  not  infrequent  failure  to  distinguish  the  rational  from 
the  true.  Tin-  excessive  ridicule,  which  this  mistake  has  visited 
on  the  supposed  irrationality  of  barbarous  and  primitive  peoples, 
affords  some  «_rood  examples.  "  Reflection  and  enquirv  should 
satisfy  us,"  says  Dr.  Frazer  in  the  dollcn  lionyh,  '*  that  to  our 
predecessors  we  are  indebted  for  much  of  what  we  thought  most 
our  own,  and  that  their  errors  were  not  wilful  extravagances 
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or  the  ravings  of  insanity,  but  simply  hypotheses,  justifiable  as 
such  at  the  time  when  they  were  propounded,  but  which  a  fuller 
experience  has  proved  to  be  inadequate.  .  .  .  Therefore,  in 
reviewing  the  opinions  and  practices  of  ruder  ages  and  races  we 
shall  do  well  to  look  with  leniency  upon  their  errors  as  inevitable 
slips  made  in  the  search  for  truth.  ..."  The  first  introduction  of 
iron  ploughshares  into  Poland,  he  tells  in  another  passage,  having 
been  followed  by  a  succession  of  bad  harvests,  the  farmers  attri 
buted  the  badness  of  the  crops  to  the  iron  ploughshares,  and  dis 
carded  them  for  the  old  wooden  ones.  The  method  of  reasoning 
of  the  farmers  is  not  different  from  that  of  science,  and  may, 
surely,  have  had  for  them  some  appreciable  probability  in  its 
favour.  "  It  is  a  curious  superstition,"  says  a  recent  pioneer  in 
Borneo,  "  this  of  the  Dusuns,  to  attribute  anything — whether 
good  or  bad,  lucky  or  unlucky — that  happens  to  them  to  some 
thing  novel  which  has  arrived  in  their  country.  For  instance, 
my  living  in  Kindram  has  caused  the  intensely  hot  weather  we 
have  experienced  of  late."  l  What  is  this  curious  superstition 
but  the  Method  of  Difference  ? 

The  following  passage  from  Jevons's  Principles  of  Science  well 
illustrates  the  tendency,  to  which  he  himself  yielded,  to  depreci 
ate  the  favourite  analogies  of  one  age,  because  the  experience  of 
their  successors  has  confuted  them.  Between  things  which  are 
the  same  in  number,  he  points  out,  there  is  a  certain  resemblance, 
namely  in  number  ;  and  in  the  infancy  of  science  men  could  not 
be  persuaded  that  there  was  not  a  deeper  resemblance  implied 
in  that  of  number.  "  Seven  days  are  mentioned  in  Genesis  ; 
infants  acquire  their  teeth  at  the  end  of  seven  months  ;  they 
change  them  at  the  end  of  seven  years  ;  seven  feet  was  the  limit 
of  man's  height ;  every  seventh  year  was  a  climacteric  or  critical 
year,  at  which  a  change  of  disposition  took  place.  In  natural 
science  there  were  not  only  the  seven  planets,  and  the  seven 
metals,  but  also  the  seven  primitive  colours,  and  the  seven  tones 
of  music.  So  deep  a  hold  did  this  doctrine  take  that  we  still  have 
its  results  in  many  customs,  not  only  in  the  seven  days  of  the 
week,  but  the  seven  years'  apprenticeship,  puberty  at  fourteen 
years,  the  second  climacteric,  and  legal  majority  at  twenty-one 
years,  the  third  climacteric."  Religious  systems  from  Pythagoras 
to  Comte  have  sought  to  derive  strength  from  the  virtue  of  seven. 
1  Golden  Bouyh,  p.  174. 
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"  And  even  in  scientific  matters  the  loftiest  intellects  have  occa 
sionally  yielded,  as  when  Newton  was  misled  by  the  analogy 
between  the  seven  tones  of  music  and  the  seven  colours  of  his 
spectrum.  .  .  .  Even  the  genius  of  Huyghens  did  not  prevent 
him  from  inferring  that  but  one  satellite  could  belong  to  Saturn, 
because,  with  those  of  .Jupiter  and  the  earth,  it  completed  the 
perfect  number  of  six."  Hut  is  it  certain  that  Newton  and 
Huyghens  were  only  reasonable  when  their  theories  were  true, 
and  that  their  mistakes  were  the  fruit  of  a  disordered  fancy  ? 
Or  that  the  savages,  from  whom  we  have  inherited  the  most 
fundamental  inductions  of  our  knowledge,  were  always  super 
stitious  when  they  believed  what  we  now  know  to  be 
preposterous  ? 

It  is  important  to  understand  that  the  common  sense  of  the 
race  has  been  impressed  by  very  weak  analogies  and  has  attri 
buted  to  them  an  appreciable  probability,  and  that  a  logical 
theory,  which  is  to  justify  common  sense,  need  not  be  afraid  of 
including  these  marginal  cases.  Even  our  belief  in  the  real 
existence  of  other  people,  which  we  all  hold  to  be  well  estab 
lished,  may  require  for  its  justification  the  combination  of 
experience  with  a  just  appreciable  d  priori  possibility  for 
Animism  generally.1  If  we  actually  possess  evidence  which 
renders  some  conclusion  absurd,  it  is  very  difficult  for  us  to 
appreciate  the  relation  of  this  conclusion  to  data  which  are 
dim-rent  and  less  complete  ;  but  it  is  essential  that  we  should 
realise  arguments  from  analogy  as  relative  to  premisses,  if  we  are 
to  approach  the  logical  theory  of  Induction  without  prejudice. 

5.  While  we  depreciate  the  former  probability  of  beliefs 
which  we  no  longer  hold,  we  tend,  I  think,  to  exaggerate  the 
present  degree  of  certainty  of  what  we  still  believe.  The  preceding 
paragraph  is  not  intended  to  deny  that  savages  often  greatly 

1  "This  H  anirui-m,  or  that  sense  of  sonn  thint:  in  Nature  which  to  tin- 
enlightened  or  civilised  man  is  not  there,  and  in  the  civilised  man's  child,  if  it 
be  admitted  that  he  has  it  at  all,  is  hut  a  faint  survival  of  a  phase  of  the 
primitive  mind.  And  by  animism  I  do  not  mean  the  theory  of  a  soul  in 
nature,  hut  the  tendency  or  impulse  or  instinct,  in  which  all  myth  originates, 
to  animate  all  things;  the  projection  of  ourselves  into  nature;  tho  sense  and 
apprehension  of  an  intelligence  like  our  own,  hut  more  powerful  in  all  visihle 
things"  (Hudson,  I-'ar  AU--II/  nml  Low)  A</o,  pp.  L'2-l  5).  This  '  tendency  or 
impulse  or  instinct,'  relined  by  reason  and  enlarged  by  experience,  may  be 
required,  in  the  shape  of  an  intuitive  a  priori  probability,  if  Homo  of  those 
universal  conclusions  of  coMimon  sense,  which  the  most  sceptical  do  not  kick 
away,  are  to  be  supported  with  rational  foundations. 
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overestimate  the  value  of  their  crude  inductions,  and  are  to  this 
extent  irrational.  It  is  not  easy  to  distinguish  between  a  belief's 
being  the  most  reasonable  of  those  which  it  is  open  to  us  to 
believe,  and  its  being  more  probable  than  not.  In  the  same  way 
we,  perhaps,  put  an  excessive  confidence  in  those  conclusions — 
the  existence  of  other  people,  for  instance,  the  law  of  gravity,  or 
to-morrow's  sunrise— of  which,  in  comparison  with  many  other 
beliefs,  we  are  very  well  assured.  We  may  sometimes  confuse 
the  practical  certainty,  attaching  to  the  class  of  beliefs  upon  which 
it  is  rational  to  act  with  the  utmost  confidence,  with  the  more 
wholly  objective  certainty  of  logic.  We  might  rashly  assert,  for 
instance,  that  to-morrowr's  sunrise  is  as  likely  to  us  as  failure, 
and  the  special  virtue  of  the  number  seven  as  unlikely,  even  to 
Pythagoras,  as  success,  in  an  attempt  to  throw  heads  a  hundred 
times  in  succession  with  an  unbiassed  coin.1 

6.  As  it  has  often  been  held  upon  various  grounds,  with 
reason  or  without,  that  the  validity  of  Induction  and  Analogy 
depends  in  some  way  upon  the  character  of  the  actual  world, 
logicians  have  sought  for  material  laws  upon  which  these  methods 
can  be  founded.  The  Laws  of  Universal  Causation  and  the 
Uniformity  of  Nature,  namely,  that  all  events  have  some  cause 
and  that  the  same  total  cause  always  produces  the  same  effect, 
are  those  which  commonly  do  service.  But  these  principles 
merely  assert  that  there  are  some  data  from  which  events  posterior 
to  them  in  time  could  be  inferred.  They  do  not  seem  to  yield  us 
much  assistance  in  solving  the  inductive  problem  proper,  or  in 
determining  how  we  can  infer  with  probability  from  partial  data. 
It  has  been  suggested  in  the  previous  chapter  that  the  Principle 
of  the  Uniformity  of  Nature  amounts  to  an  assertion  that  an 
argument  from  perfect  analogy  (defined  as  I  have  defined  it)  is 
valid  when  applied  to  events  only  differing  in  their  positions  in 
time  or  space.2  It  has  also  been  pointed  out  that  ordinary  in 
ductive  arguments  appear  to  be  strengthened  by  any  evidence 
which  makes  them  approximate  more  closely  in  character  to  a 
perfect  analogy.  But  this,  I  think,  is  the  whole  extent  to  which 
this  principle,  even  if  its  truth  could  be  assumed,  would  help  us. 

1  Yet  if  every  inhabitant  of  the  world,  Grimsehl  has  calculated,  were  to  toss 
a  coin  every  second,  day  and  night,  this  latter  event  would  only  occur  once  on 
the  average  in  every  twenty  billion  years. 

2  Is  this  interpretation  of  the  Principle  of  the  Uniformity  of  Nature  affected 
by  the  Doctrine  of  Relativity? 
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States  of  the  universe,  identical  in  every  particular,  may  never 
recur,  and,  even  if  identical  states  were  to  recur,  we  should  not 
know  it. 

The  kind  of  fundamental  assumption  about  the  character  of 
material  laws,  on  which  scientists  appear  commonly  to  act, 
seems  to  me  to  be  much  less  simple  than  the  bare  principle  of 
Uniformity.  They  appear  to  assume  something  much  more  like 
what  mathematicians  call  the  principle  of  the  superposition  of 
small  effects,  or,  as  1  prefer  to  call  it,  in  this  connection,  the 
atomic  character  of  natural  law.  The  system  of  the  material 
universe  must  consist,  if  this  kind  of  assumption  is  warranted, 
of  bodies  which  we  may  term  (without  any  implication  as  to 
their  size  being  conveyed  thereby)  Icyal  atoms,  such  that  each  of 
them  exercises  its  own  separate,  independent,  and  invariable 
effect,  a  change  of  the  total  state  being  compounded  of  a  number 
of  separate  changes  each  of  which  is  solely  due  to  a  separate 
portion  of  the  preceding  state.  We  do  not  have  an  invariable 
relation  between  particular  bodies,  but  nevertheless  each  has  on 
the  others  its  own  separate  and  invariable  effect,  which  does  not 
change  with  changing  circumstances,  although,  of  course,  the 
total  effect  may  be  changed  to  almost  any  extent  if  all  the  other 
accompanying  causes  are  different.  Each  atom  can,  accord 
ing  to  this  theory,  be  treated  as  a  separate  cause  and  does 
not  enter  into  different  organic  combinations  in  each  of  which 
it  is  regulated  by  different  laws. 

Perhaps  it  has  not  always  been  realised  that  this  atomic 
uniformity  is  in  no  way  implied  by  the  principle  of  the 
Uniformity  of  Nature.  Vet  there  might  well  be  quite  different 
laws  for  wholes  of  different  degrees  of  complexity,  and  laws  of 
connection  between  complexes  which  could  not  be  stated  in 
terms  of  laws  connecting  individual  parts.  In  this  case 
natural  law  would  be  organic  and  not,  as  it  is  generally 
supposed,  atomic.  If  every  configuration  of  the  Universe  were 
subject  to  a  separate  and  independent  law,  or  if  very  small 
differences  between  bodies  in  their  shape  or  size,  for  instance, — 
led  to  their  obeying  quite  different  laws,  prediction  would  be 
impossible  and  the  inductive  method  useless.  Vet  nature  might 
still  be  uniform,  causation  sovereign,  and  laws  timeless  and 
absolute. 

The  scientist  wishes,  in  fact,  to  assume  that  the  occurrence 
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of  a  phenomenon  which  has  appeared  as  part  of  a  more  complex 
phenomenon,  may  be  some  reason  for  expecting  it  to  be  associated 
on  another  occasion  with  part  of  the  same  complex.  Yet  if 
different  wholes  were  subject  to  different  laws  qua  wholes  and 
not  simply  on  account  of  and  in  proportion  to  the  differences  of 
their  parts,  knowledge  of  a  part  could  not  lead,  it  would  seem, 
even  to  presumptive  or  probable  knowledge  as  to  its  association 
with  other  parts.  Given,  on  the  other  hand,  a  number  of  legally 
atomic  units  and  the  laws  connecting  them,  it  would  be  possible 
to  deduce  their  effects  pro  tanto  without  an  exhaustive  knowledge 
of  all  the  coexisting  circumstances. 

We  do  habitually  assume,  I  think,  that  the  size  of  the  atomic 
unit  is  for  mental  events  an  individual  consciousness,  and  for 
material  events  an  object  small  in  relation  to  our  perceptions. 
These  considerations  do  not  show  us  a  way  by  which  we  can 
j ustify  Induction.  But  they  help  to  elucidate  the  kind  of  assump 
tions  which  we  do  actually  make,  and  may  serve  as  an  introduction 
to  what  follows. 


CHAPTER    XXII 

THE    JUSTIFICATION    OF   THESE    METHODS 

1.  THE  general  line  of  thought  to  be  followed  in  this  chapter  may 
be  indicated,  briefly,  at  the  outset. 

A  system  of  facts  or  propositions,  as  we  ordinarily  conceive 
it,  may  comprise  an  indefinite  number  of  members.  But  the 
ultimate  constituents  or  indefinables  of  the  system,  which  all 
the  members  of  it  are  about,  are  less  in  number  than  these 
members  themselves.  Further,  there  are  certain  laws  of  necessary 
connection  between  the  members,  by  which  it  Is  meant  (I  do  not 
stop  to  consider  whether  more  than  this  is  meant)  that  the  truth 
or  falsity  of  every  member  can  be  inferred  from  a  knowledge  of 
the  laws  of  necessary  connection  together  with  a  knowledge  of  the 
truth  or  falsity  of  some  (but  not  all)  of  the  members. 

The  ultimate  constituents  together  with  the  laws  of  necessary 
connection  make  up  what  I  shall  term  the  independent  variety 
of  the  system.  The  more  numerous  the  ultimate  constituents 
and  the  necessary  laws,  the  greater  is  tin;  system's  independent 
variety.  It  is  not  necessary  for  my  present  purpose,  which  is 
merely  to  bring  before  the  reader's  mind  the  sort  of  conception 
which  is  in  mine,  that  I  should  attempt  a  complete  definition 
of  what  1  mean  by  a  system. 

Now  it  is  characteristic  of  a  system,  as  distinguished  from 
a  collection  of  heterogeneous  and  independent  facts  or  proposi 
tions,  that  the  number  of  its  premisses,  or,  in  other  words,  the 
amount  of  independent  variety  in  it,  should  be  less  than  the 
number  of  its  members.  But  it  is  not  an  obviously  essential 
characteristic  of  a  system  that  its  premisses  or  its  indepen 
dent  variety  should  be  actually  finite.  We  must  distinguish, 
therefore,  between  systems  which  may  be  termed  finite  and 
infinite  respectively,  the  txTnis  finite  and  infinite  referring  not  to 
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the  number  of  members  in  the  system  but  to  the  amount  of  in 
dependent  variety  in  it. 

The  purpose  of  the  discussion,  which  occupies  the  greater 
part  of  this  chapter,  is  to  maintain  that,  if  the  premisses  of  our 
argument  permit  us  to  assume  that  the  facts  or  propositions, 
with  which  the  argument  is  concerned,  belong  to  a  finite  system, 
then  probable  knowledge  can  be  validly  obtained  by  means  of 
an  inductive  argument.  I  now  proceed  to  approach  the  question 
from  a  slightly  different  standpoint,  the  controlling  idea,  however, 
being  that  which  is  outlined  above. 

2.  What  is  our  actual  course  of  procedure  in  an  inductive 
argument  ?  We  have  before  us,  let  us  suppose,  a  set  of  n  in 
stances  which  have  r  known  qualities,  a1o2  .  .  .  a,,  in  common, 
these  r  qualities  constituting  the  known  positive  analogy.  From 
these  qualities  three  (say)  are  picked  out,  namely,  alt  a2,  a3,  and 
we  inquire  with  what  probability  all  objects  having  these  three 
qualities  have  also  certain  other  qualities  which  we  have  picked 
out,  namely,  ar_v  ar.  We  wish  to  determine,  that  is  to  say, 
whether  the  qualities  r?,._1,  af  are  bound  up  with  the  qualities 
a1}  <72,  «3.  In  thus  approaching  this  question  we  seem  to 
suppose  that  the  qualities  of  an  object  are  bound  together  in 
a  limited  number  of  groups,  a  sub-class  of  each  group  being  an 
infallible  symptom  of  the  coexistence  of  certain  other  members 
of  it  also. 

Three  possibilities  are  open,  any  of  which  would  prove 
destructive  to  our  generalisation.  It  may  be  the  case  (1)  that 
a-f.i  or  af  is  independent  of  all  the  other  qualities  of  the  instances 
— they  may  not  overlap,  that  is  to  say,  with  any  other  groups  ; 
or  (2)  that  a^aza^  do  not  belong  to  the  same  groups  as  af.-^ir  ', 
or  (3)  that  a-^a^a^,  while  they  belong  to  the  same  group  as  a,..^-,., 
are  not  sufficient  to  specify  this  group  uniquely — they  belong, 
that  is  to  say,  to  other  groups  also  which  do  not  include  ar_1  and 
ar.  The  precautions  we  take  are  directed  towards  reducing  the 
likelihood,  so  far  as  we  can,  of  each  of  these  possibilities.  We 
distrust  the  generalisation  if  the  terms  typified  by  ar_^ar  are 
numerous  and  comprehensive,  because  this  increases  the  likeli 
hood  that  some  at  least  of  them  fall  under  heading  (1),  and  also 
because  it  increases  the  likelihood  of  (3).  We  trust  it  if  the 
terms  typified  by  ct^a2a^  are  numerous  and  comprehensive, 
because  this  decreases  the  likelihood  both  of  (2)  and  of  (3).  If 
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we  find  a  new  instance  which  agrees  with  tin*  former  instances  in 
'W'.y  \"r  nl't  n°t  in  '/,,  we  welcome  it,  because  this  disposes  of 
the  possibility  that  it  is  </,.  alone  or  in  combination,  that  is  bound 
up  with  a,  i<ir.  A\  e  desire  to  increase  our  knowledge  of  the 
properties,  lest  there  be  some  positive  analogy  which  is  escaping  us, 
and  when  our  knowledge  is  incomplete  we  multiply  instances, 
which  we  do  not  know  to  increase  the  negative  analogy  for 
certain,  in  the  hope  that  they  may  do  so. 

If  we  sum  up  the  various  methods  of  Analogy,  we  find,  1 
think,  that  they  are  all  capable  of  arising  out  of  an  underlying 
assumption,  that  if  we  find  two  sets  of  qualities  in  coexistence 
there  is  a  finite  probability  that  they  belong  to  the  same  group, 
and  a  finite  probability  also  that  the  first  set  specifies  this  group 
uniquely.  Starting  from  this  assumption,  the  object  of  the 
methods  is  to  increase  the  finite  probability  and  make  it  large. 
Whether  or  not  anything  of  this  sort  is  explicitly  present  to  our 
minds  when  we  reason  scientifically,  it  seems  clear  to  me  that  we 
do  act  exactly  as  we  should  act,  if  this  were  the  assumption  from 
which  we  set  out. 

In  most  cases,  of  course,  the  field  is  greatly  simplified  from 
the  first  by  the  use  of  our  pre-existing  knowledge.  Of  the 
properties  before  us  we  generally  have  good  reason,  derived 
from  prior  analogies,  for  supposing  some  to  belong  to  the  same 
group  and  others  to  belong  to  different  groups.  Hut  this  docs 
not  afTect  the  theoretical  problem  confronting  us. 

3.  What  kind  of  ground  could  justify  us  in  assuming  the 
existence  of  these  finite  probabilities  which  we  seem  to  require  ? 
If  we  are  to  obtain  them,  not  directly,  but  by  means  of  argument, 
we  must  somehow  base  them  upon  a  finite  number  of  exhaustive 
alternatives. 

The  following  line  of  argument  seems  tome  to  represent,  on 
the  whole,  the  kind  of  assumption  which  is  obscurely  present  to 
our  minds.  We  suppose,  I  think,  that  the  almost  innumerable 
apparent  properties  of  any  given  object  all  arise  out  of  a  finite 
number  of  generator  properties,  which  we  may  call  (fr^^x-  •  •  • 
Some  arise  out  of  <£j  alone,  some  out  of  ^>l  in  conjunction  with  <£.,. 
and  so  on.  The  properties  which  arise  out  of  </>j  alone  form  one 
group  ;  those  which  arise  out  of  <f>i<f>2  m  conjunction  form  another 
group,  and  so  on.  Since  the  number  of  generator  properties  is 
finite,  the  number  of  groups  also  is  finite.  If  a  set  of  apparent 
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properties  arise  (say)  out  of  three  generator  properties  $•$$& 
then  this  set  of  properties  may  be  said  to  specify  the  group 
^)i^)2<^3-  Since  the  total  number  of  apparent  properties  is  assumed 
to  be  greater  than  that  of  the  generator  properties,  and  since  the 
number  of  groups  is  finite,  it  follows  that,  if  two  sets  of  apparent 
properties  are  taken,  there  is,  in  the  absence  of  evidence  to  the 
contrary,  a  finite  probability  that  the  second  set  will  belong 
to  the  group  specified  by  the  first  set. 

There  is,  however,  the  possibility  of  a  plurality  of  generators. 
The  first  set  of  apparent  properties  may  specify  more  than  one 
group, — there  is  more  than  one  group  of  generators,  that  is  to 
say,  which  are  competent  to  produce  it ;  and  some  only  of  these 
groups  may  contain  the  second  set  of  properties.  Let  us,  for 
the  moment,  rule  out  this  possibility. 

When  we  argue  from  an  analogy,  and  the  instances  have 
two  groups  of  characters  in  common,  namely  $  and  /,  either  / 
belongs  to  the  group  </>  or  it  arises  out  of  generators  partly  distinct 
from  those  out  of  which  <£  arises.  For  the  reason  already  ex 
plained  there  is  a  finite  probability  that  /  and  <£  belong  to  the 
same  group.  If  this  is  the  case,  i.e.  if  the  generalisation  #(</>/) 
is  valid,  then  /  will  certainly  be  true  of  all  other  cases  in  which 
</>  is  true  ;  if  this  is  not  the  case,  then  /  will  not  always  be  true 
when  (/>  is  true.  We  have,  therefore,  the  preliminary  conditions 
necessary  for  the  application  of  pure  induction.  If  xr,  etc.,  are 
the  instances, 

g/h  =pQ,  where  pQ  is  finite, 
xr/gh  =  I,  etc., 
and  xrjx^x2 . . .  xr_1yh  =  I  -e,  where  6  is  finite. 

And  hence,  by  the  argument  of  Chapter  XX.,  the  probability  of  a 
generalisation,  based  on  such  evidence  as  this,  is  capable,  under 
suitable  conditions,  of  tending  towards  certainty  as  a  limit,  when 
the  number  of  instances  is  increased. 

If  </>  is  complex  and  includes  a  number  of  characters  which 
are  not  always  found  together,  it  must  include  a  number  of 
separate  generator  properties  and  specify  a  large  group  ;  hence 
the  initial  probability  that  /  belongs  to  this  group  is  relatively 
large.  If,  on  the  other  hand,  /  is  complex,  there  will  be,  for  the 
same  reasons  mutatis  mutandis,  a  relatively  smaller  initial  prob 
ability  than  otherwise  that/  belongs  to  any  other  given  group. 


en.  xxii  INDUCTION  AND  ANALOGY  255 

When  the  argument  is  mainly  by  analogy,  we  endeavour  to 
obtain  evidence  which  makes  the  initial  probability  pu  relatively 
high  ;  when  the  analogy  is  weak  and  the  argument  depends  for 
its  strength  upon  pure  induction,  p()  is  small  and  pm,  which  is 
based  upon  numerous  instances,  depends  for  its  magnitude  upon 
their  number.  .But  an  argument  from  induction  must  always 
involve  some  element  of  analogy,  and,  on  the  other  hand,  few 
arguments  from  analogy  can  afford  to  ignore  altogether  the 
strengthening  influence  of  pure  induction. 

4.  Let  us  consider  the  manner  in  which  the  methods  of 
analogy  increase  the  initial  likelihood  that  two  characters  belong 
to  the  same  group.  The  numerous  characters  of  an  object  which 
are  known  to  us  may  be  represented  by  a^^  . . .  a,,.  We  select 
two  sets  of  these,  ar  and  «0  and  seek  to  determine  whether  a, 
always  belongs  to  the  group  specified  by  a,..  Our  previous  know 
ledge  will  enable  as.  in  general,  to  rule  out  many  of  the  object's 
characters  as  being  irrelevant  to  the  groups  specified  by  a,  and  av, 
although  this  will  not  be  possible  in  the  most  fundamental  in 
quiries.  We  may  also  know  that  certain  characters  are  always 
associated  with  a,  or  with  '/,.  But  there  will  be  left  a  residuum 
of  whose  connection  with  a,,  or  <i  we  are  ignorant.  These 
characters,  whose  relevance  is  in  doubt,  may  be  represented  by 
ar+i'»a<-i>  If  the  analogy  is  perfect,  these  characters  are 
eliminated  altogether.  Otherwise,  the  argument  is  weakened 
in  proportion  to  the  comprehensiveness  of  these  doubtful  char 
acters.  For  it  may  be  the  case  that  some  of  ar+1.  .  .  a,^  are 
necessary  as  well  as  a,.,  in  order  to  specify  all  the  generators 
which  are  required  to  produce  «,. 

5.  We  may  possibly  be  justified  in  neglecting  certain  of  the 
characters  ar.l...as_1  by  direct  judgments  of  irrelevance. 
There  are  certain  properties  of  objects  which  we  rule  out  from 
the  beginning  as  wholly  or  largely  independent  and  irrelevant  to 
all,  or  to  some,  other  properties.  The  principal  judgments  of 
this  kind,  and  those  alone  about  which  we  seem  to  feel  much 
confidence,  are  concerned  with  absolute  position  in  time  and 
space,  this  class  of  judgments  of  irrelevance  being  summed  up, 
1  have  suggested,  in  the  Principle  of  the  Uniformity  of  Nature. 
We  judge  that  mere  position  in  time  and  space  cannot  possibly 
aflect,  as  a  determining  cause,  any  other  characters  ;  and  this 
belief  appears  so  strong  and  certain,  although  it  is  hard  to  see 
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how  it  can  be  based  on  experience,  that  the  judgment  by  which 
we  arrive  at  it  seems  perhaps  to  be  direct.  A  further  type  of 
instance  in  which  some  philosophers  seem  to  have  trusted  direct 
judgments  of  relevance  in  these  matters  arises  out  of  the  relation 
between  mind  and  matter.  They  have  believed  that  no  mental 
event  can  possibly  be  a  necessary  condition  for  the  occurrence  of 
a  material  event. 

The  Principle  of  the  Uniformity  of  Nature,  as  I  interpret  it, 
supplies  the  answer,  if  it  is  correct,  to  the  criticism  that  the 
instances,  on  which  generalisations  are  based,  are  all  alike  in 
being  past,  and  that  any  generalisation,  which  is  applicable  to 
the  future,  must  be  based,  for  this  reason,  upon  imperfect  analogy. 
We  judge  directly  that  the  resemblance  between  instances,  which 
consists  in  their  being  past,  is  in  itself  irrelevant,  and  does  not 
supply  a  valid  ground  for  impugning  a  generalisation. 

But  these  judgments  of  irrelevance  are  not  free  from  difficulty, 
and  we  must  be  suspicious  of  using  them.  When  I  say  that  posi 
tion  is  irrelevant,  I  do  not  mean  to  deny  that  a  generalisation,  the 
premiss  of  which  specifies  position,  may  be  true,  and  that  the 
same  generalisation  without  this  limitation  might  be  false.  But 
this  is  because  the  generalisation  is  incompletely  stated  ;  it 
happens  that  objects  so  specified  have  the  required  characters, 
and  hence  their  position  supplies  a  sufficient  criterion.  Position 
may  be  relevant  as  a  sufficient  condition  but  never  as  a  necessary 
condition,  and  the  inclusion  of  it  can  only  affect  the  truth  of  a 
generalisation  when  we  have  left  out  some  other  essential  con 
dition.  A  generalisation  which  is  true  of  one  instance  must  be 
true  of  another  which  only  differs  from  the  former  by  reason  of 
its  position  in  time  or  space. 

6.  Excluding,  therefore,  the  possibility  of  a  plurality  of 
generators,  we  can  justify  the  method  of  perfect  analogy,  and 
other  inductive  methods  in  so  far  as  they  can  be  made  to 
approximate  to  this,  by  means  of  the  assumption  that  the 
objects  in  the  field,  over  which  our  generalisations  extend,  do 
not  have  an  infinite  number  of  independent  qualities ;  that,  in 
other  words,  their  characteristics,  however  numerous,  cohere 
together  in  groups  of  invariable  connection,  which  are  finite 
in  number.  This  does  not  limit  the  number  of  entities  which 
are  only  numerically  distinct.  In  the  language  used  at  the 
beginning  of  this  chapter,  the  use  of  inductive  methods  can  be 
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justified  if  they  are  applied  to  what  we  have  reason  to  suppose 
a  finite  system.1 

7.  Let    us    now    take    account   of    a   possible    plurality    of 
generators.      I  moan  by  this  the  possibility  that  a  given  char 
acter  can  arise  in  more  than  one  way,  can  belong  to  more  than 
one  distinct  group,  and  can  arise  out  of  more  than  one  generator. 
</>  might,  for  instance,  bo  sometimes  due  to  a  generator  a,,  and 
«!   might  invariably  produce/.     But  we  could   not   generalise 
from  (/>  to/,  if  $  might  be  due   in  other  cases   to  a  different 
generator  «.,  which  would  not  be  competent  to  produce/. 

If  we  were  dealing  with  inductive  correlation,  where  we  do 
not  claim  universality  for  our  conclusions,  it  would  be  sufficient 
for  us  to  assume  that  the  number  of  distinct  generators,  to  which 
a  given  property  <£  can  be  due,  is  always  finite.  To  obtain  validity 
for  universal  generalisations  it  seems  necessary  to  make  the  more 
comprehensive  and  less  plausible  assumption  that  a  finite  prob 
ability  always  exists  that  there  is  not,  in  any  given  case,  a  plurality 
of  causes.  With  this  assumption  we  have  a  valid  argument  from 
pure  induction  on  the  same  lines,  nearly,  as  before. 

8.  We  have  thus  two  distinct  difficulties  to  deal  with,  and  we 
require;  for  the  solution   of  each   a  separate  assumption.     The 
point  may  be  illustrated  by  an  example  in  which  only  one  of  the 
difficulties  is  present.     There  are  few  arguments  from  analogy  of 
which  we  are  better  assured  than  the  existence  of  other  people. 
We  feel  indeed  so  well  assured  of  their  existence  that  it  has  been 
thought  sometimes  that  our  knowledge  of  them  must  be  in  some 
way  direct.     But  analogy  does  not  seem  to  me  unequal  to  the 
proof.     We  have  numerous   experiences   in   our  own   person  of 
acts  which  are  associated  with  states  of  consciousness,  and  we 
infer  that  similar  acts  in  others  are  likely  to  be  associated  with 
similar  states  of  consciousness.     But  this  argument  from  analogy 
is  superior   in   one  respect  to  nearly  all  other  empirical  argu 
ments,  and  this  superiority  may  possibly  explain  the  great  con 
fidence  which  we  feel  in  it.     We  do  seem  in  this  case  to  have 
direct  knowledge,  such   as  we  have  in  no  other  case,  that  our 
states  of  consciousness   are,  sometimes  at  least,   causally  con 
nected  with  some  of  our  acts.     \\Y  do  not,  as  in  other  cases, 

Mr.  C.  I).  Broad,  in  two  articles  ••  On  tho  Relation  between  Induction  and 
Probability"  (Mind,  1918  and  1920),  haa  l>con  following  a  «imilar  lino  of 
thought. 
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merely  observe  invariable  sequence  or  coexistence  between  con 
sciousness  and  act ;  and  we  do  believe  it  to  be  vastly  improbable 
in  the  case  of  some  at  least  of  our  own  physical  acts  that  they 
could  have  occurred  without  a  mental  act  to  support  them. 
Thus,  we  seem  to  have  a  special  assurance  of  a  kind  not  usually 
available  for  believing  that  there  is  sometimes  a  necessary  con 
nection  between  the  conclusion  and  the  condition  of  the 
generalisation;  we  doubt  it  only  from  the  possibility  of  a 
plurality  of  causes. 

The  objection  to  this  argument  on  the  ground  that  the  analogy 
is  always  imperfect,  in  that  all  the  observed  connections  of 
consciousness  and  act  are  alike  in  being  mine,  seems  to  me  to  be 
invalid  on  the  same  ground  as  that  on  which  I  have  put  on  one 
side  objections  to  future  generalisations,  which  are  based  on  the 
fact  that  the  instances  which  support  them  are  all  alike  in  being 
past.  If  direct  judgments  of  irrelevance  are  ever  permissible, 
there  seems  some  ground  for  admitting  one  here. 

fe.  As  a  logical  foundation  for  Analogy,  therefore,  we  seem  to 

„ some  such  assumption  as  that  the  amount  of  variety  in  the 

universe  is  limited  in  such  a  way  that  there  is  no  one  object  so 
complex  that  its  qualities  fall  into  an  infinite  number  of  inde 
pendent  groups  (i.e.  groups  which  might  exist  independently 
as  well  as  in  conjunction) ;  or  rather  that  none  of  the  objects 
about  which  we  generalise  are  as  complex  as  this  ;  or  at  least 
that,  though  some  objects  may  be  infinitely  complex,  we  some 
times  have  a  finite  probability  that  an  object  about  which  we 
seek  to  generalise  is  not  infinitely  complex. 

To  meet  a  possible  plurality  of  causes  some  further  assumption 
is  necessary.  If  we  were  content  with  Inductive  Correlations 
and  sought  to  prove  merely  that  there  was  a  probability  in  favour 
of  any  instance  of  the  generalisation  in  question,  without  in 
quiring  whether  there  was  a  probability  in  favour  of  every  instance, 
it  would  be  sufficient  to  suppose  that,  while  there  may  be  more 
than  one  sufficient  cause  of  a  character,  there  is  not  an  infinite 
number  of  distinct  causes  competent  to  produce  it.  And  this 
involves  no  new  assumption  ;  for  if  the  aggregate  variety  of  the 
system  is  finite,  the  possible  plurality  of  causes  must  also  be  finite. 
If,  however,  our  generalisation  is  to  be  universal,  so  that  it  breaks 
down  if  there  is  a  single  exception  to  it,  we  must  obtain,  by  some 
means  or  other,  a  finite  probability  that  the  set  of  characters, 


ne 
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which  condition  the  generalisation,  are  not  the  possible  effect  of 
more  than  one  distinct  set  of  fundamental  properties.     I  do  not 
know  upon  what  ground  we  could  establish  a  finite  probability 
to  this  effect.     The  necessity  for  this  seemingly  arbitrary  hypo 
thesis  strongly  suggests  that  our  conclusions  should  be  in  the 
form  of  inductive  correlations,  rather  than  of  universal  general 
isations.     Perhaps  our  generalisations  should  always  run  :   '  It  is 
probable  that  any  given  </>  is/,'  rather  than,  '  It  is  probable  that 
,  all  (/>  are/.'    Certainly,  what  we  commonly  seem  to  hold  with  con- 
;  viction  is  the  belief  that  the  sun  will  rise  to-morrow,  rather  than 
Uhe  belief  that  the  sun  will  always  rise  so  long  as  the  conditions 
explicitly   known   to  us  are  fulfilled.     This  will  be   matter  for 
further  discussion   in    Part  V.,   when    Inductive  Correlation  is 
specifically  dealt  with. 

10.  There  is  a  vagueness,  it  may  be  noticed,  in  the  number  of 
instances,  which  would  be   required  on  the  above  assumptions 
to   establish    a   given    numerical   degree   of    probability,   which 
corresponds  to  the  vagueness  in  the  degree  of  probability  which 
we   do   actually  "attach   to   inductive   conclusions.     We   assume 
that  the  necessary  number  of  instances  is  finite,  but  we  do  not 
know  what  the  number  is.     We  know  that  the  probability  of  a 
well-established  induction  is  great,  but,  when  we  are  asked  to 
name  its  decree,  we  cannot.      Common  sense  tells  us  that  some 
inductive  arguments  are  stronger  than  others,  and  that  some 
are  very  strong.     But  how  much   stronger  or  how  strong  we 
cannot    express.      The    probability    of    an    induction    is    only 
numerically  definite  when  we  are  able  to  make  definite  assump 
tions  about  the  number  of  independent  equiprobable  influences 
at  work.     Otherwise,  it  is  non-numerical,  though  bearing  relations 
of  greater  and  less  to  numerical   probabilities  according  to  the 
approximate  limits  within  which  our  assumption  as  to  the  possible 
number  of  these  causes  lies. 

11.  Up  to  this  point  I  have  supposed,  for  the  sake  of  simplicity, 
that  it  is  necessary  to  make  our  assumptions  as  to  the  limitation 
of  independent  variety  in  an  absolute  form,  to  assume,  that  is  to 
say,  the  finiteness  of  the  system,  to  which  the  argument  is  applied, 

for  certain.     But  we  need  not  in  fact  go  so  far  as  this. 

If  our  conclusion  is  C  and  our  empirical  evidence  is  E,  then, 
in  order  to  justify  inductive  methods,  our  premisses  must  include, 
in  addition  to  E,  a  general  hypothesis  II  such  that  C/II,  the 
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a  priori  probability  of  our  conclusion,  has  a  finite  value.  The 
effect  of  E  is  to  increase  the  probability  of  C  above  its  initial 
d  priori  value,  C/HE  being  greater  than  C/H.  But  the  method 
of  strengthening  C/H  by  the  addition  of  evidence  E  is  valid  quite 
apart  from  the  particular  content  of  H.  If,  therefore,  we  have 
another  general  hypothesis  H'  and  other  evidence  E',  such  that 
H/H'  has  a  finite  value,  we  can,  without  being  guilty  of  a  circular 
argument,  use  evidence  E'  by  the  same  method  as  before  to 
strengthen  the  probability  H/H'.  If  we  call  H,  namely,  the 
absolute  assertion  of  the  finiteness  of  the  system  under  considera 
tion,  the  inductive  hypothesis,  and  the  process  of  strengthening 
C/H  by  the  addition  E  the  inductive  method,  it  is  not  circular  to 
use  the  inductive  method  to  strengthen  the  inductive  hypothesis 
itself,  relative  to  some  more  primitive  and  less  far-reaching  assump 
tion.  If,  therefore,  we  have  any  reason  (H')  for  attributing 
d  priori  a  finite  probability  to  the  Inductive  Hypothesis  (H),  then 
the  actual  conformity  of  experience  d  posteriori  with  expectations 
based  on  the  assumption  of  H  can  be  utilised  by  the  inductive 
method  to  attribute  an  enhanced  value  to  the  probability  of  H. 
To  this  extent,  therefore,  we  can  support  the  Inductive  Hypothesis 
by  experience.  In  dealing  with  any  particular  question  we  can 
take  the  Inductive  Hypothesis,  not  at  its  d  priori  value,  but  at 
the  value  to  which  experience  in  general  has  raised  it.  What 
we  require  d  priori,  therefore,  is  not  the  certainty  of  the  Inductive 
Hypothesis,  but  a  finite  probability  in  its  favour.1 

Our  assumption,  in  its  most  limited  form,  then,  amounts  to 
this,  that  we  have  a  finite  d  priori  probability  in  favour  of 
the  Inductive  Hypothesis  as  to  there  being  some  limitation 
of  independent  variety  (to  express  shortly  what  I  have  already 
explained  in  detail)  in  the  objects  of  our  generalisation.  Our 
experience  might  have  been  such  as  to  diminish  this  probability 
d  posteriori.  It  has,  in  fact,  been  such  as  to  increase  it.  It  is 
because  there  has  been  so  much  repetition  and  uniformity  in  our 
experience  that  we  place  great  confidence  in  it.  To  this  extent 
the  popular  opinion  that  Induction  depends  upon  experience  for 
its  validity  is  justified  and  does  not  involve  a  circular  argument. 

1  I  have-  implicitly  assumed  in  the  above  ai-gumcnt  that  if  PL'  supports  H,  it 
strengthens  an  argument  which  H  would  strengthen.  This  is  not  necessarily 
the  c°ase  for  the  reasons  given  on  pp.  68  and  147.  In  these  passages  the 
necessary  conditions  for  the  above  are  elucidated.  I  am,  therefore  assuming 
that  in  the  case  now  in  question  these  conditions  actually  are  fulfilled. 
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12.  1  think  that  this  assumption  is  adequate  to  its  purpose 
and  would  justifv  our  ordinary  methods  of  procedure  in  inductive 
argument.     It  was  suggested  in  the  previous  chapter  that  our 
theory  of  Analogy  ought  to  be  as  applicable  to  mathematical 
as  to  material  generalisations,  if  it  is  to  justify  common  sense. 
The  above  assumptions  of  the  limitation  of  independent  variety 
sufficiently   satisfy   this   condition.     There   is  nothing   in  these 
assumptions  which  gives  them  a  peculiar  reference  to  material 
objects.     We  believe,  in  fact,  that  all  the  properties  of  numbers 
can  be  derived  from  a  limited  number  of  laws,  and  that  the  same 
set  of  laws  governs  all  numbers.     To  apply  empirical  methods  to 
such  things  us  numbers  renders  it  necessary,  it  is  true,  to  make 
an  assumption  about  the  nature  of  numbers.     But  it  is  the  same 
kind  of  assumption  as  we  have  to  make  about  material  objects, 
and  has  just  about  as  much,  or  as  little,  plausibility.     There  is 
no  new  difficulty. 

The  assumption,  also,  that  the  system  of  Nature  is  finite  is 
in  accordance  with  the  analysis  of  the  underlying  assumption  of 
scientists,  given  at  the  close  of  the  previous  chapter.  The 
hypothesis  of  atomic  uniformity,  as  I  have  called  it,  while  not 
formally  equivalent  to  the  hypothesis  of  the  limitation  of  inde 
pendent  variety,  amounts  to  very  much  the  same  thing,  if  the 
fundamental  laws  of  connection  changed  altogether  with  varia 
tions,  for  instance,  in  the  shape  or  size  of  bodies,  or  if  the  laws 
governing  the  behaviour  of  a  complex  had  no  relation  whatever 
to  the  laws  governing  the  behaviour  of  its  parts  when  belonging 
to  other  complexes,  there  could  hardly  be  a  limitation  of  inde 
pendent  variety  in  the  sense  in  which  this  has  been  defined.  And, 
on  the  other  hand,  a  limitation  of  independent  variety  seems 
necessarily  to  carry  with  it  some  degree  of  atomic  uniformity. 
The  underlying  conception  as  to  the  character  of  the  System  of 
Nature  is  in  each  case  the  same. 

13.  We  have  now  reached  the  last  and  most  difficult  stage  of 
the  discussion.     The  logical  part  of  our  inquiry  is  complete,  and 
it  has  left  us,  as  it  is  its  business  to  leave  us,  with  a  question  of 
epistemology.     Such   is  the   premiss  or  assumption   which   our 
logical  processes  need  to  work  upon.     What  right  have  we  to 
make  it  ?     It  is  no  sufficient  answer  in  philosophy  to  plead  that 
the  assumption  is  after  all  a  very  little  one. 

I  do  not  believe  that  any  conclusive  or  perfectly  satisfactory 
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answer  to  this  question  can  be  given,  so  long  as  our  knowledge 
of  the  subject  of  epistemology  is  in  so  disordered  and  undeveloped 
a  condition  as  it  is  in  at  present.  No  proper  answer  has  yet  been 
given  to  the  inquiry — of  what  sorts  of  things  are  we  capable  of 
direct  knowledge  ?  The  logician,  therefore,  is  in  a  weak  position, 
when  he  leaves  his  own  subject  and  attempts  to  solve  a  particular 
instance  of  this  general  problem.  He  needs  guidance  as  to  what 
kind  of  reason  we  could  have  for  such  an  assumption  as  the  use 
of  inductive  argument  appears  to  require. 

On  the  one  hand,  the  assumption  may  be  absolutely  d  priori 
in  the  sense  that  it  would  be  equally  applicable  to  all  possible 
objects.  On  the  other  hand,  it  may  be  seen  to  be  applicable  to 
some  classes  of  objects  only.  In  this  case  it  can  only  arise  out 
of  some  degree  of  particular  knowledge  as  to  the  nature  of  the 
objects  in  question,  and  is  to  this  extent  dependent  on  experience. 
But  if  it  is  experience  which  in  this  sense  enables  us  to  know  the 
assumption  as  true  of  certain  amongst  the  objects  of  experience, 
it  must  enable  us  to  know  it  in  some  manner  which  we  may  term 
direct  and  not  as  the  result  of  an  inference. 

Now  an  assumption,  that  all  systems  of  fact  are  finite  (in  the 
sense  in  which  I  have  denned  this  term),  cannot,  it  seems  perfectly 
plain,  be  regarded  as  having  absolute,  universal  validity  in  the 
sense  that  such  an  assumption  is  self -evidently  applicable  to  every 
kind  of  object  and  to  all  possible  experiences.  It  is  not,  therefore, 
in  quite  the  same  position  as  a  self-evident  logical  axiom,  and  does 
not  appeal  to  the  mind  in  the  same  way.  The  most  which  can 
be  maintained  is  that  this  assumption  is  true  of  some  systems  of 
fact,  and,  further,  that  there  are  some  objects  about  which,  as 
soon  as  we  understand  their  nature,  the  mind  is  able  to  apprehend 
directly  that  the  assumption  in  question  is  true. 

In  Chapter  II.  §  7,  I  wrote  :  "  By  some  mental  process  of 
which  it  is  difficult  to  give  an  account,  we  are  able  to  pass  from 
direct  acquaintance  with  things  to  a  knowledge  of  propositions 
about  the  things  of  which  we  have  sensations  or  understand  the 
meaning."  Knowledge,  so  obtained,  I  termed  direct  knowledge. 
From  a  sensation  of  yellow  and  from  an  understanding  of  the 
meaning  of  '  yellow  '  and  of  '  colour,'  we  could,  I  suggested, 
have  direct  knowledge  of  the  fact  or  proposition  '  yellow  is  a 
colour ; '  we  might  also  know  that  colour  cannot  exist  without 
extension,  or  that  two  colours  cannot  be  perceived  at  the  same 
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time  in  the  same  place.  Other  philosophers  might  use  terms 
differently  and  express  themselves  otherwise  ;  but  the  substance 
of  what  I  was  there  trying  to  say  is  not  very  disputable.  But 
when  we  come  to  the  question  as  to  what  kinds  of  propositions 
we  can  come  to  know  in  this  manner,  we  enter  upon  an  unex 
plored  field  where  no  certain  opinion  is  discoverable. 

In  the  case  of  logical  terms,  it  seems  to  be  generally  agreed 
that  if  we  understand  their  meaning  we  can  know  directly  pro 
positions  about  them  which  go  far  beyond  a  mere  expression  of 
this  meaning ; — propositions  gf  the  kind  which  some  philo 
sophers  have  termed  syntlietic.  In  the  case  of  non-logical  or 
empirical  entities,  it  seems  sometimes  to  be  assumed  that  our 
direct  knowledge  must  be  confined  to  what  may  be  regarded  as 
an  expression  or  description  of  the  meaning  or  sensation  appre 
hended  by  us.  If  this  view  is  correct  the  Inductive  Hypothesis 
is  not  the  kind  of  thing  about  which  we  can  have  direct  know 
ledge  as  a  result  of  our  acquaintance  with  objects. 

I  suggest,  however,  that  this  view  is  incorrect,  and  that  we 
are  capable  of  direct  knowledge  about  empirical  entities  which 
goes  beyond  a  mere  expression  of  our  understanding  or  sensation 
ot  them.  It  may  be  useful  to  give  the  reader  two  examples,  more 
familiar  than  the  Inductive  Hypothesis,  where,  as  it  appears  to 
me,  such  knowledge  is  commonly  assumed.  The  first  is  that  of  the 
causal  irrelevance  of  mere  position  in  time  and  space,  commonly 
called  the  Uniformity  of  Nature.  We  do  believe,  and  yet  have 
no  adequate  inductive  reason  whatever  for  believing,  that  mere 
position  in  time  and  space  cannot  make  any  difference.  This 
belief  arises  directly,  I  think,  out  of  our  acquaintance  with 
the  objects  of  experience  and  our  understanding  of  the  concepts 
of  '  time '  and  '  space.'  The  second  is  that  of  the  Law  of 
Causation.  We  believe  that  every  object  in  time  has  a  '  neces 
sary  '  connection  1  with  some  set  of  objects  at  a  previous  time. 
This  belief  also,  I  think,  arises  in  the  same  way.  It  is  to  be 
noticed  that  neither  of  these  beliefs  clearly  arises,  in  spite  of  the 
directness  which  may  be  claimed  for  them,  out  of  any  one  single 
experience.  In  a  way  analogous  to  these,  the  validity  of  assuming 
the  Inductive  Hypothesis,  as  applied  to  a  particular  class  of 
objects,  appears  to  me  to  be  justified. 

Our  justification  for  using  inductive  methods  in  an  argument 

1   1  do  not  proposo  to  define  the  meaning  of  thiw. 


264  A  TREATISE  ON  PROBABILITY  IT.  m 

about  numbers  arises  out  of  our  perceiving  directly,  when  we 
understand  the  meaning  of  a  number,  that  they  are  of  the  re 
quired  character.1  And  when  we  perceive  the  nature  of  our 
phenomenal  experiences,  we  have  a  direct  assurance  that  in  their 
case  also  the  assumption  is  legitimate.  We  are  capable,  that 
is  to  say,  of  direct  synthetic  knowledge  about  the  nature 
of  the  objects  of  our  experience.  On  the  other  hand,  there 
may  be  some  kinds  of  objects,  about  which  we  have  no  such 
assurance  and  to  which  inductive  methods  are  not  reasonably 
applicable.  It  may  be  the  case  that  some  metaphysical  questions 
are  of  this  character  and  that  those  philosophers  have  been  right 
who  have  refused  to  apply  empirical  methods  to  them. 

14.  I  do  not  pretend  that  I  have  given  any  perfectly  adequate 
reason  for  accepting  the  theory  I  have  expounded,  or  any  such 
theory.  The  Inductive  Hypothesis  stands  in  a  peculiar  position 
in  that  it  seems  to  be  neither  a  self-evident  logical  axiom  nor  an 
object  of  direct  acquaintance  ;  and  yet  it  is  just  as  difficult,  as 
though  the  inductive  hypothesis  were  either  of  these,  to  remove 
from  the  organon  of  thought  the  inductive  method  which  can 
only  be  based  on  it  or  on  something  like  it. 

As  long  as  the  theory  of  knowledge  is  so  imperfectly 
understood  as  now,  and  leaves  us  so  uncertain  about  the  grounds 
of  many  of  our  firmest  convictions,  it  would  be  absurd  to 
confess  to  a  special  scepticism  about  this  one.  I  do  not  think 
that  the  foregoing  argument  has  disclosed  a  reason  for  such 
scepticism.  We  need  not  lay  aside  the  belief  that  this  conviction 
gets  its  invincible  certainty  from  some  valid  principle  darkly 
present  to  our  minds,  even  though  it  still  eludes  the  peering 
eyes  of  philosophy. 

1  Since  numbers  are  logical  entities,  it  may  be  thought  less  unorthodox  to 
make  such  an  assumption  in  their  case. 


CHAPTER   XXIII 

SOME    HISTORICAL    NOTES    ON    INDUCTION 

1.  THE  number  of  books,  which  deal  with  inductive  l  theory,  is 
extraordinarily  small.  It  is  usual  to  associate  the  subject  with 
the  names  of  Bacon,  Hume,  and  Mill.  In  spite  of  the  modern 
tendency  to  depreciate  the  first  and  the  last  of  these,  they  are  the 
principal  names,  I  think,  with  which  the  history  of  induction 
ought  to  be  associated.  The  next  place  is  held  by  Laplace  and 
Jevons.  Amongst  contemporary  logicians  there  is  an  almost 
complete  absence  of  constructive  theory,  and  they  content 
themselves  for  the  most  part  with  the  easy  task  of  criticising 
Mill,  or  with  the  more  difficult  one  of  following  him. 

That  the  inductive  theories  of  Bacon  and  of  Mill  are  full  of 
errors  and  even  of  absurdities,  is,  of  course,  a  commonplace  of 
criticism.  But  when  we  ignore  details,  it  becomes  clear  that  they 
were  really  attempting  to  disentangle  the  essential  issues.  \\ V 
depreciate  them  partly,  perhaps,  as  a  reaction  from  the  view  once 
held  that  they  helped  the  progress  of  scientific  discovery.  For 
it  is  not  plausible  to  suppose  that  Newton  owed  anything  to  Bacon, 
or  Darwin  to  Mill.  But  with  the  logical  problem  their  minds 
were  truly  occupied,  and  in  the  history  of  logical  theory  they 
should  always  be  important. 

It  is  true,  nevertheless,  that  the  advancement  of  science  was 
the  main  object  which  Bacon  himself,  though  not  Mill,  believed 
that  his  philosophy  would  promote.  The,  (treat  Inslauration  was 
intended  to  promulgate  an  actual  method  of  discovery  entirely 
different  from  any  which  had  been  previously  known.2  It  did 

1  See  note  at  the  end  of  this  chapter  on  "  The  Use  of  the  Term  Induction." 

2  He  speaks  of  himself  us  being  "  in   hue   re   plane   protopirus,   et    vestigia 
nullius  Bequutus  " ;    and  in  the  Prarfntio  (jenrmlin  he  compares  hi.s  method  to 
the  mariner's  compass,   until   the  discovery  of   which    no   wide  sen  could    In- 
crossed  (see  Spedding  and  Kiln,  vol.  i.  p.  24). 
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not  do  this,  and  against  such  pretensions  Macaulay's  well-known 
essay  was  not  unjustly  directed.  Mill,  however,  expressly  dis 
claimed  in  his  preface  any  other  object  than  to  classify  and 
generalise  the  practices  "  conformed  to  by  accurate  thinkers  in 
their  scientific  inquiries."  Whereas  Bacon  offered  rules  and 
demonstrations,  hitherto  unknown,  with  which  any  man  could 
solve  all  the  problems  of  science  by  taking  pains,  Mill  admitted 
that  "  in  the  existing  state  of  the  cultivation  of  the  sciences, 
there  would  be  a  very  strong  presumption  against  any  one 
who  should  imagine  that  he  had  effected  a  revolution  in  the 
theory  of  the  investigation  of  truth,  or  added  any  fundamentally 
new  process  to  the  practice  of  it." 

2.  The  theories  of  both  seem  to  me  to  have  been  injured, 
though  in  different  degrees,  by  a  failure  to  keep  quite  distinct 
the  three  objects  :  (1)  of  helping  the  scientist,  (2)  of  explaining 
and  analysing  his  practice,  and  (3)  of  justifying  it.  Bacon  was 
really  interested  in  the  second  as  well  as  in  the  first,  and  was 
led  to  some  of  his  methods  by  reflecting  upon  what  distinguished 
good  arguments  from  bad  in  actual  investigations.  To  logicians 
his  methods  were  as  new  as  he  claimed,  but  they  had  their 
origin,  nevertheless,  in  the  commonest  inferences  of  science  and 
daily  life.  But  his  main  preoccupation  was  with  the  first,  which 
did  injury  to  his  treatment  of  the  third.  He  himself  became 
aware  as  the  work  progressed  that,  in  his  anxiety  to  provide 
an  infallible  mode  of  discovery,  he  had  put  forth  more  than  he 
would  ever  be  able  to  justify.1  His  own  mind  grew  doubtful, 
and  the  most  critical  parts  of  the  description  of  the  new  method 
were  never  written.  No  one  who  has  reflected  much  upon  In 
duction  need  find  it  difficult  to  understand  the  progress  and 
development  of  Bacon's  thoughts.  To  the  philosopher  who  first 
distinguished  some  of  the  complexities  of  empirical  proof  in  a 
generalised,  and  not  merely  a  particular,  form,  the  prospects  of 
systematising  these  methods  must  have  seemed  extraordinarily 
hopeful.  The  first  investigator  could  not  have  anticipated  that 
Induction,  in  spite  of  its  apparent  certainty,  would  prove  so 
elusive  to  analysis. 

Mill  also  was  led,  in  a  not  dissimilar  way,  to  attempt  a  too 

1  This  view  is  taken  in  the  edition  of  James  Spedding  and  Leslie  Ellis. 
Their  introductions  to  Bacon's  philosophical  works  seem  to  me  to  be  very  greatly 
superior  to  the  accounts  to  be  found  elsewhere.  They  make  intelligible,  what 
seems,  according  to  other  commentaries,  fanciful  and  without  sense  or  reason. 
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simple  treatment,  and,  in  seeking  for  ease  and  certainty,  to 
treat  far  too  lightly  the  problem  of  justifying  what  he  had 
claimed.  Mill  shirks,  almost  openly,  the  difficulties  ;  and  scarcely 
attempts  to  disguise  from  himself  or  his  readers  that  he  grounds 
induction  upon  a  circular  argument. 

3.  Some  of  the  most  characteristic  errors  both  of  Bacon  and 
of  Mill  arise,  I  think,  out  of  a  misapprehension,  which  it  has  been 
a  principal  object  of  this  book  to  correct.  Both  believed,  without 
hesitation  it  seems,  that  induction  is  capable  of  establishing  a 
conclusion  which  is  absolutely  certain,  and  that  an  argument 
is  invalid  if  the  generalisation,  which  it  supports,  admits  of 
exceptions  in  fact.  "  Absolute  certainty,"  says  Leslie  Ellis,1  "  is 
one  of  the  distinguishing  characters  of  the  Baconian  induction." 
It  was,  in  this  respect,  mainly  that  it  improved  upon  the  older 
induction  per  enutnerationem  simplicem.  "  The  induction  which 
the  logicians  speak  of,"  Bacon  argues  in  the  Advancement  of 
Learning,  "  is  utterly  vicious  and  incompetent.  .  .  .  For  to  con 
clude  upon  an  enumeration  of  particulars,  without  instance 
contradictory,  is  no  conclusion  but  a  conjecture."  The  conclusions 
of  the  new  method,  unlike  those  of  the  old,  are  not  liable  to  be 
upset  by  further  experience.  In  the  attempt  to  justify  these 
claims  and  to  obtain  demonstrative  methods,  it  was  necessary 
to  introduce  assumptions  for  which  there  was  no  warrant. 

Precisely  similar  claims  were  made  by  Mill,  although  there 
are  passages  in  which  he  abates  them,2  for  his  own  rules  of  pro 
cedure.  An  induction  has  no  validity,  according  to  him  as 
according  to  Bacon,  unless  it  is  absolutely  certain.  The  follow 
ing  passage  3  is  significant  of  the  spirit  in  which  the  subject 
was  approached  by  him  :  "  Let  us  compare  a  few  cases 
of  incorrect  inductions  with  others  which  are  acknowledged 
to  be  legitimate.  Some,  we  know,  which  were  believed  for 
centuries  to  be  correct,* wen;  nevertheless  incorrect.  That  all 
swans  are  white,  cannot  have  been  a  good  induction,  since  the  con 
clusion  has  turned  out  erroneous.  The  experience,  however,  on 
which  the  conclusion  rested  was  genuine."  Mill  has  not  justly 
apprehended  the  relativity  of  all  inductive  arguments  to  the 
evidence,  nor  the  element  of  uncertainty  which  is  present,  more 

1  Op.  cit.  v>l.  i.  p.  L>;{. 

*   When  h<-  drills  with  Plurality  of  t'auws,  for  in.stunrr 
'  Hk.  iii.  rhjip.  iii.  3  (the  itiilirs  arc  inino). 
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or  less,  in  all  the  generalisations  which  they  support.1  Mill's 
methods  would  yield  certainty,  if  they  were  correct,  just  as 
Bacon's  would.  It  is  the  necessity,  to  which  Mill  had  subjected 
himself,  of  obtaining  certainty  that  occasions  their  want  of 
reality.  Bacon  and  Mill  both  assume  that  experiment  can 
shape  and  analyse  the  evidence  in  a  manner  and  to  an  extent 
which  is  not  in  fact  possible.  In  the  aims  and  expectations  with 
which  they  attempt  to  solve  the  inductive  problem,  there  is  on 
fundamental  points  an  unexpectedly  close  resemblance  beween 
them. 

4.  Turning  from  these  general  criticisms  to  points  of  greater 
detail,  we  find  that  the  line  of  thought  pursued  by  Mill  was 
essentially  the  same  as  that  which  had  been  pursued  by  Bacon, 
and,  also,  that  the  argument  of  the  preceding  chapters  is,  in 
spite  of  some  real  differences,  a  development  of  the  same  funda 
mental  ideas  which  underlie,  as  it  seems  to  me,  the  theories  of 
Mill  and  Bacon  alike. 

We  have  seen  that  all  empirical  arguments  require  an  initial 
probability  derived  from  analogy,  and  that  this  initial  probability 
may  be  raised  towards  certainty  by  means  of  pure  induction 
or  the  multiplication  of  instances.  In  some  arguments  we  depend 
mainly  upon  analogy,  and  the  initial  probability  obtained  by 
means  of  it  (with  the  assistance,  as  a  rule,  of  previous  knowledge) 
is  so  large  that  numerous  instances  are  not  required.  In  other 
arguments  pure  induction  predominates.  As  science  advances 
and  the  body  of  pre-existing  knowledge  is  increased,  we  depend 
increasingly  upon  analogy  ;  and  only  at  the  earlier  stages  of  our 
investigations  is  it  necessary  to  rely,  for  the  greater  part  of  our 
support,  upon  the  multiplication  of  instances.  Bacon's  great 
achievement,  in  the  history  of  logical  theory,  lay  in  his  being  the 
first  logician  to  recognise  the  importance  of  methodical  analogy 
to  scientific  argument  and  the  dependence  upon  it  of  most  well- 
established  conclusions.  The  Novum  Organum  is  mainly  con 
cerned  with  explaining  methodical  ways  of  increasing  what  I 
have  termed  the  Positive  and  Negative  Analogies,  and  of  avoiding 
false  Analogies.  The  use  of  exclusions  and  rejections,  to  which 

1  This  misapprehension  may  be  connected  with  Mill's  complete  failure  to 
grasp  with  any  kind  of  thoroughness  the  nature  and  importance  of  the  theory  of 
probability.  The  treatment  of  this  topic  in  the  System  of  Logic  is  exceedingly 
bad.  His  understanding  of  the  subject  was,  indeed,  markedly  inferior  to  the 
best  thought  of  his  own  time. 
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Bacon  attached  supreme  importance,  and  which  he  held  to  con 
stitute  the  essential  superiority  of  his  method  over  those  which 
preceded  it,  entirely  consists  in  the  determination  of  what  char 
acters  (or  natures  as  he  would  call  them)  belong  to  the  positive 
and  negative  analogies  respectively.  The  first  two  tables  with 
which  the  investigation  begins  ar<>,  first,  the  table  essentiae  et 
prdcsentiae,  which  contains  all  known  instances  in  which  the 
given  nature  is  present,  and,  second,  the  table  declitiationis  sive 
absentiae  in  proximo,  which  contains  instances  corresponding  in 
each  case  to  those  of  the  first  table,  but  in  which,  notwithstanding 
this  correspondence,  the  given  nature  is  absent.1  The  doctrine 
of  prerogative  instances  is  concerned  no  less  plainly  with  the 
methodical  determination  of  Analogy.  And  the  doctrine  of 
idols  is  expounded  for  the  avoidance  of  false  analogies,  standing, 
he  says,  in  the  same  relation  to  the  interpretation  of  Nature,  as 
the  doctrine  of  fallacies  to  ordinary  logic.2  Bacon's  error  lay 
in  supposing  that,  because  these  methods  were  new  to  logic,  they 
were  therefore  new  to  practice.  He  exaggerated  also  their  pre 
cision  and  their  certainty  ;  and  he  underestimated  the  import 
ance  of  pure  induction.  But  there  was,  at  bottom,  nothing  about 
his  rules  impracticable  or  fantastic,  or  indeed  unusual. 

5.  Almost  the  whole  of  the  preceding  paragraph  is  equally 
applicable  to  Mill.  He  agreed  with  Bacon  in  depreciating  the 
part  played  in  scientific  inquiry  by  pure  induction,  and  in 
emphasising  the  importance  of  analogy  to  all  systematic  investi 
gators.  But  he  saw  further  than  Bacon  in  allowing  for  the 
Plurality  of  Causes,  and  in  admitting  that  an  element  of  pure 
induction  was  therefore;  made  necessary.  ''  The  Plurality  of 
Causes,"  he  says,3  "  is  the  only  reason  why  mere  number  of  in 
stances  is  of  any  importance  in  inductive  inquiry.  The  tendency 
of  unscientific  inquirers  is  to  rely  too  much  on  number,  without 
analysing  the  instances.  .  .  .  Most  people  hold  their  conclusions 
with  a  degree  of  assurance  proportioned  to  the  mere  miss  of  the 
experience  on  which  they  appear  to  rest ;  not  considering  that 
by  the  addition  of  instances  to  instances,  all  of  the  same  kind, 
that  is,  differing  from  one  another  only  in  points  already  recog 
nised  as  immaterial,  nothing  whatever  is  added  to  the  evidence  of 

1  Klli>,  vol.  i.  1 1.  :!.'{. 

2  Kills,  vol.  i.  p.  S<). 

>   Hook  iv.  rluip.  x.  •_'. 
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the  conclusion.  A  single  instance  eliminating  some  antecedent 
which  existed  in  all  the  other  cases,  is  of  more  value  than  the 
greatest  multitude  of  instances  which  are  reckoned  by  their 
number  alone."  Mill  did  not  see,  however,  that  our  knowledge 
of  the  instances  is  seldom  complete,  and  that  new  instances,  which 
are  not  known  to  differ  from  the  former  in  material  respects,  may 
add,  nevertheless,  to  the  negative  analogy,  and  that  the  multi 
plication  of  them  may,  for  this  reason,  strengthen  the  evidence. 
It  is  easy  to  see  that  his  methods  of  Agreement  and  Difference 
closely  resemble  JBacon's,  and  aim,  like  Bacon's,  at  the  deter 
mination  of  the  Positive  and  Negative  Analogies.  By  allowing 
for  Plurality  of  Causes  Mill  advanced  beyond  Bacon.  But  he 
was  pursuing  the  same  line  of  thought  which  alike  led  to  Bacon's 
rules  and  has  been  developed  in  the  chapters  of  this  book. 
Like  Bacon,  however,  he  exaggerated  the  precision  with  which 
his  canons  of  inquiry  could  be  used  in  practice. 

6.  No  more  need  be  said  respecting  method  and  analysis. 
But  in  both  writers  the  exposition  of  method  is  closely  inter 
mingled  with  attempts  to  justify  it.  There  is  nothing  in  Bacon 
which  at  all  corresponds  to  Mill's  appeals  to  Causation  or  to  the 
Uniformity  of  Nature,  and,  when  they  seek  for  the  ground  of 
induction,  there  is  much  that  is  peculiar  to  each  writer.  It  is 
my  purpose,  however,  to  consider  in  this  place  the  details  common 
to  both,  which  seem  to  me  to  be  important  and  which  exemplify 
the  only  line  of  investigation  which  seems  likely  to  be  fruitful ; 
and  I  shall  pursue  no  further,  therefore,  their  numerous  points 
of  difference. 

The  attempt,  which  I  have  made  to  justify  the  initial  prob 
ability  which  Analogy  seems  to  supply,  primarily  depends  upon 
a  certain  limitation  of  independent  variety  and  upon  the  deriva 
tion  of  all  the  properties  of  any  given  object  from  a  limited 
number  of  primary  characters.  In  the  same  way  I  have  supposed 
that  the  number  of  primary  characters  which  are  capable  of 
producing  a  given  property  is  also  limited.  And  I  have  argued 
that  it  is  not  easy  to  see  how  a  finite  probability  is  to  be  obtained 
unless  we  have  in  each  case  some  such  limitation  in  the  number 
of  the  ultimate  alternatives. 

It  was  in  a  manner  which  bears  fundamental  resemblances 
to  this  that  Bacon  endeavoured  to  demonstrate  the  cogency  of 
his  method.  He  considers,  he  says,  "  the  simple  forms  or  differ- 
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ence  of  things  which  are  few  in  number,  and  the  degrees  and 
co-ordinations  whereof  make  all  this  variety."  And  in  Valerius 
Terminus  he  argues  "that  every  particular  that  worketh  any 
effect  is  a  thing  compounded  more  or  less  of  diverse  single  natures, 
more  manifest  and  more  obscure,  and  that  it  appeareth  not  to 
which  of  the  natures  the  effect  is  to  be  ascribed."  l  It  is  indeed 
essential  to  the  method  of  exclusions  that  the  matter  to  which  it 
is  applied  should  be  somehow  resolvable  into  a  finite  number  of 
elements.  Hut  this  assumption  is  not  peculiar,  I  think,  to 
Bacon's  method,  and  is  involved,  in  some  form  or  other,  in  every 
argument  from  Analogy.  In  making  it  Bacon  was  initiating, 
perhaps  obscurely,  the  modern  conception  of  a  finite  number  of 
laws  of  nature  out  of  the  combinations  of  which  the  almost  bound 
less  variety  of  experience  ultimately  arises.  Bacon's  error  was 
double  and  lay  in  supposing,  first,  that  these  distinct  elements 
lie  upon  the  surface  and  consist  in  visible  characters,  and  second, 
that  their  natures  are,  or  easily  can  be,  known  to  us,  although 
the  part  of  the  Installation,  in  which  the  manner  of  conceiving 
simple  natures  was  to  be  explained,  he  never  wrote.  These 
beliefs  falsely  simplified  the  problem  as  he  saw  it,  and  led  him 
to  exaggerate  the  ease,  certainty,  and  fruitfulness  of  the  new 
method.  But  the  view  that  it  is  possible  to  reduce  all  the 
phenomena  of  the  universe  to  combinations  of  a  limited  number 
of  simple  elements — which  is,  according  to  Ellis,2  the  central 
point  of  Bacon's  whole  system — was  a  real  contribution  to  philo 
sophy. 

7.  The  assumption  that  every  event  can  be  analysed  into  a 
limited  number  of  ultimate  elements,  is  never,  so  far  as  I  am 
aware,  explicitly  avowed  by  Mill.  But  he  makes  it  in  almost 
every  chapter,  and  it  underlies,  throughout,  his  mode  of  procedure. 
His  methods  and  arguments  would  fail  immediately,  if  we  were 
to  suppose  that  phenomena  of  infinite  complexity,  due  to  an 
infinite  number  of  independent  elements,  were  in  question,  or 
if  an  infinite  plurality  of  causes  had  to  be  allowed  for. 

In  distinguishing,  therefore,  analogy  from  pure  induction, 
and  in  justifying  it  by  the  assumption  of  a  limited  complexity  in 
the  problems  which  we  investigate,  I  am,  I  think,  pursuing,  with 
numerous  differences,  the  line  of  thought  which  Bacon  first 

1   (J  noted  by  Kills,  vol.  i.  p.  41. 
1   Vol.  i.  p.  28. 
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pursued  and  which  Mill  popularised.  The  method  of  treatment 
is  dissimilar,  but  the  subject-matter  and  the  underlying  beliefs 
are,  in  each  case,  the  same. 

8.  Between  Bacon  and  Mill  came  Hume.     Hume's  sceptical 
criticisms  are  usually  associated  with  causality  ;    but  argument 
by  induction — inference  from  past  particulars  to  future  generalisa 
tions — was  the  real  object  of  his  attack.     Hume  showed,  not  that 
inductive  methods  were  false,  but  that  their  validity  had  never 
been  established  and  that  all  possible  lines  of  proof  seemed 
equally  unpromising.     The  full  force  of  Hume's  attack  and  the 
nature  of  the  difficulties  which  it  brought  to  light  were  never 
appreciated   by  Mill,  and  he  makes   no  adequate   attempt  to 
deal  with  them.     Hume's  statement  of  the  case  against  induction 
has  never  been  improved  upon  ;    and  the  successive  attempts 
of  philosophers,  led  by  Kant,  to  discover  a  transcendental  solu 
tion  have  prevented  them  from  meeting  the  hostile  arguments  on 
their  own  ground  and  from  finding  a  solution  along  lines  which 
might,  conceivably,  have  satisfied  Hume  himself. 

9.  It  would  not  be  just  here  to  pass  by  entirely  the  name 
of  the  great  Leibniz,  who,  wiser  in  correspondence  and  frag 
mentary  projects  than  in  completed  discourses,  has  left  to  us 
sufficient  indications  that  his  private  reflections  on  this  subject 
were  much  in  advance  of  his  contemporaries'.     He  distinguished 
three  degrees  of  conviction  amongst  opinions,  logical  certainty 
(or,  as  we  should  say,  propositions  known  to  be  formally  true), 
physical  certainty  which  is  only  logical  probability,  of  which  a 
well-established  induction,  as  that  man  is  a  biped,  is  the  type, 
and  physical  probability   (or,  as  we   should  say,  an  inductive 
correlation),  as  for  example  that  the  south  is  a  rainy  quarter.1 
He   condemned   generalisations    based   on   mere    repetition    of 
instances,  which  he  declared  to  be  without  logical  value,  and  he 
insisted  on  the  importance  of  Analogy  as  the  basis  of  a  valid 
induction."      He  regarded   a    hypothesis  as    more  probable  in 
proportion  to  its  simplicity  and  its  power,  that  is  to  say,  to  the 
number  of  the  phenomena  it  would  explain  and  the  fewness  of 
the  assumptions  it  involved.      In  particular  a  power  of  accurate 
prediction   and  of   explaining   phenomena  or   experiments  pre- 

1  Couturat,  Opuscules  et  frar/ments  iru'dits  de  Leibniz,  p.  232. 

2  Couturat,    La    Loyique  de    Leibniz   d'apris    des    documents    incdits,   pp. 
202,  207. 
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viously  untried  is  a  just  ground  of  secure  confidence,  of  which 
he  cites  as  a  nearly  perfect  example  the  key  to  a  cryptogram.1 

10.  Whewell  and  Jevons  furnished  logicians   with   a  store 
house    of   examples   derived    from   the    practice   of    scientists. 
Jevons,   partly    anticipated    by    Laplace,   made    an    important 
advance    when    he    emphasised    the    close    relation     between 
Induction   and    Probability.     Combining   insight  and  error,  he 
spoilt  brilliant  suggestions  by  erratic  and  atrocious  arguments. 
His  application  of  Inverse  Probability  to  the  inductive  problem 
is   crude   and    fallacious,    but    the   idea   which   underlies    it    is 
substantially    good.     He,    too,    made   explicit    the    element    of 
Analogy,  which   Mill,  though   he   constantly  employed  it,   had 
seldom  called   by   its   right   name.     There    are   few   books,  so 
superficial  in  argument  yet  suggesting  so  much  truth,  as  Jevons's 
Principles  of  Science. 

11.  Modern  text-books  on  Logic  all  contain  their  chapters  on 
Induction,  but  contribute  little  to  the  subject.     Their  recogni 
tion  of  Mill's  inadequacy  renders  their  exposition,  which,  in  spite 
of  criticisms,  is  generally  along  his  lines,  nerveless  and  confused. 
Where    Mill    is    clear   and    offers    a    solution,    they,   confuscdly 
criticising,  must  withhold  one.     The  best  of  them,  Sigwart  and 
Venn,  contain  criticism  and  discussion  which  is  interesting,  but 
constructive  theory  is  lacking.     Hitherto  Hume  has  been  master, 
only  to  be  refuted  in  the  manner  of  Diogenes  or  Dr.  Johnson. 

1   Letter  to  Conrin^',  10th  March  1078. 


NOTES  ON  PART   III 

(i.)  ON  THE  USE  OF  THE  TERM  INDUCTION 

1.  INDUCTION  is  in  origin  a  translation  of  the  Aristotelian  e 
This  term  was  used  by  Aristotle  in  two  quite  distinct  senses — first, 
and  principally,  for  the  process  by  which  the  observation  of  particular 
instances,  in  which  an  abstract  notion  is  exemplified,  enables  us  to 
realise  and  comprehend  the  abstraction  itself  ;  secondly,  for  the  type 
of  argument  in  which  we  generalise  after  the  complete  enumeration 
and  assertion  of  all  the  particulars  which  the  generalisation  embraces. 
From  this  second  sense  it  was  sometimes  extended  to  cases  in  which 
we  generalise  after  an  incomplete  enumeration.  In  post- Aristotelian 
writers  the  induction  per  enumerationem  simplicem  approximates  to 
induction  in  Aristotle's  second  sense,  as  the  number  of  instances  is 
increased.  To  Bacon,  therefore,  "  the  induction  of  which  the  logicians 
speak  "  meant  a  method  of  argument  by  multiplication  of  instances. 
He  himself  deliberately  extended  the  use  of  the  term  so  as  to  cover 
all  the  systematic  processes  of  empirical  generalisation.  But  he 
also  used  it,  in  a  manner  closely  corresponding  to  Aristotle's  first  use, 
for  the  process  of  forming  scientific  conceptions  and  correct  notions 
of  "  simple  natures."  1 

2.  The  modern  use  of  the  term  is  derived  from  Bacon's.  Mill 
defines  it  as  "  the  operation  of  discovering  and  proving  general 
propositions."  His  philosophical  system  required  that  he  should 
define  it  as  widely  as  this  ;  but  the  term  has  really  been  used,  both 
by  him  and  by  other  logicians,  in  a  narrower  sense,  so  as  to  cover 
those  methods  of  proving  general  propositions,  which  we  call  empiri 
cal,  and  so  as  to  exclude  generalisations,  such  as  those  of  mathematics, 
which  have  been  proved  formally.  Jevons  was  led,  partly  by  the 
linguistic  resemblance,  partly  because  in  the  one  case  we  proceed 
from  the  particular  to  the  general  and  in  the  other  from  the  general 
to  the  particular,  to  define  Induction  as  the  inverse  process  of 
Deduction.  In  contemporary  logic  Mill's  use  prevails  ;  but  there 

1  See  Kills'*  edition  of  Bacon's  \Yorkx,  vol.  i.  p.  37.  On  the  first  oeeasion 
on  \vhich  Induction  is  mentioned  in  the  Xovuni  Orga nurn,  it  is  used  in  this 
secondary  sense. 

274 


NOTES  INDUCTION  AND  ANALOGY  275 

is,  at  tin1  same  time,  a  suggest ion  —arising  from  earlier  usage,  and 
because  Bacon  and  Mill  never  quite  freed  themselves  from  it — of 
argument  by  mere  multiplication  <>f  instances.  I  have  thought  it 
best,  therefore,  to  use  the  term  /ntrc  induction  to  describe  arguments 
which  are  based  upon  the  nuinlx'r  of  instances,  and  to  use  induction 
itself  for  all  those  types  of  arguments  which  combine,  in  one  form  or 
another,  pure  induction  with  analogy. 


(ii.)    O.N    THK    LTSK    OF    TIIK    TKRM    CAl'SK 

1.  Throughout  the  preceding  argument,  as  well  as  in   Part    II.. 
I  have  been  able  to  avoid  the  metaphysical  difficulties  which  surround 
the   true   meaning   of  <v/*/.sr.      It    was   not    necessary   that    I    should 
inquire  whether   I    meant    by  cauxal  connection  an  invariable  con 
nection  in  fact  merely,  or  whether  some  more  intimate  relation  was 
involved.      It  has  aNo  lieen  convenient  to  speak  of  causal  relations 
between  objects  which  do  not  strictly  stand  in  the  position  of  cause 
and  effect,  and  even  to  speak  of  u  probable  cause,  where  there  is  no 
implication  of  necessity  and  where  the  antecedents  will  sometimes 
lead  to  particular  consequents  and  sometimes  will  not.      In  making 
this  use  of  the  term,  I  have  followed  a  practice  not  uncommon  amongst 
writers  on  probability,   who  constantly  use  the  term  c/titsr,   where 
hypothesis  might  seem  more  appropriate.1 

One  is  led,  almost  inevitably,  to  use.  'cause  '  more  widely  than 
'sufficient  cause  '  or  than  *  necessary  cause,'  because,  the  necessary 
causation  of  particulars  by  particulars  being  rarely  apparent  to  us, 
the  strict  sense  of  the  term  has  little  utility.  Those  antecedent 
circumstances,  which  we  are  usually  content  to  accept  as  causes,  are 
only  so  in  strictness  under  a  favourable  conjunction  of  innumerable 
other  influences. 

2.  As  our  knowledge  is  partial,  there  is  constantly,  in  our  use 
of  the  term  caw.sr,  SOUK;  reference  implied  or  expressed  to  a  limited 
body  of  knowledge.     It  is  clear  that,  whether  or  not,  as  Oournot  2 
maintains,  there  are  such  things  as  independent  series  in  the  order 
of  causation,  there  is  often  a  sense  in  which  we  may  hold  that  there 
is  a  closer  intimacy  between  some  series  than  between  others.     This 
intimacy   is   relative,    I    think,   to  particular   information,    which    is 
actuallv  known  to  us,  or  which  is  within  our  reach.      It  will  be  useful, 
therefore,  to  give  precise  definitions  of  these  wider  senses  in  which 
it  is  often  convenient  to  use  the  expression  caii-m'. 

1  <  'f.  ('/.nlx-r,   WahracJieinlirhkeittrechnung,  p.  1 .'{'.).      In  dealing  with  In\er*r 
Probability  C/uIxT  explains  that  he  meant*  by  possible  ctmw  tin-  various  lie- 
dtnyungitkomplexe  from  which  tho  cause  run  result. 

2  S,-e  Chapter  XXIV.  $3. 
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We  must  first  distinguish  between  assertions  of  law  and  assertions 
of  fact,  or,  in  the  terminology  of  Von  Kries,1  between  nomologic  and 
ontologic  knowledge.  It  may  be  convenient  in  dealing  with  some 
questions  to  frame  this  distinction  with  reference  to  the  special 
circumstances.  But  the  distinction  generally  applicable  is  between 
propositions  which  contain  no  reference  to  particular  moments  of 
time,  and  existential  propositions  which  cannot  be  stated  without 
reference  to  specific  points  in  the  time  series.  The  Principle  of  the 
Uniformity  of  Nature  amounts  to  the  assertion  that  natural  laws 
are  all,  in  this  sense,  timeless.  We  may,  therefore,  divide  our  data 
into  two  portions  k  and  I,  such  that  k  denotes  our  formal  and 
nomologic  evidence,  consisting  of  propositions  whose  predication 
does  not  involve  a  particular  time  reference,  and  I  denotes  the 
existential  or  ontologic  propositions. 

3.  Let  us  now  suppose  that  we  are  investigating  two  existential 
propositions  a  and  b,  which  refer  two  events  A  and  B  to  particular 
moments  of  time,  and  that  A  is  referred  to  moments  which  are  all 
prior  to  those  at  which  B  occurred.  What  various  meanings  can  we 
give  to  the  assertion  that  A  and  B  are  causally  connected  ? 

(i.)  If  b/ak  =  1,  A  is  a  sufficient  cause  of  B.  In  this  case  A  is  a 
cause  of  B  in  the  strictest  sense,  b  can  be  inferred  from  a,  and  no 
additional  knowledge  consistent  with  k  can  invalidate  this. 

(ii.)  If  b/dk  =  0,  A  is  a  necessary  cause  of  B. 

(iii.)  If  k  includes  all  the  laws  of  the  existent  universe,  then  A 
is  not  a  sufficient  cause  of  B  unless  b/ak  =  1.  The  Law  of  Causation, 
therefore,  which  states  that  every  existent  has  to  some  other  previous 
existent  the  relation  of  effect  to  sufficient  cause,  is  equivalent  to  the 
proposition  that,  if  k  is  the  body  of  natural  law,  then,  if  b  is  true, 
there  is  always  another  true  proposition  a,  which  asserts  existences 
prior  to  B,  such  that  b/ak  =  l.  No  use  has  been  made  so  far  of  our 
existential  knowledge  I,  which  is  irrelevant  to  the  definitions  pre 
ceding. 

(iv.)  If  b/akl  =  1  and  b/kl  4=  1,  A  is  a  sufficient  cause  of  B  under 
conditions  I. 

(v.)  If  b/i/kl  =  0  and  b/kl  ^  0,  A  is  a  necessary  cause  of  B  under 
conditions  I. 

(vi.)  If  there  is  any  existential  proposition  h  such  that  b/ahk  =  1 
and  b/hk  4=  1,  A  is,  relative  to  k,  a  possible  sufficient  cause  of  B. 

(vii.)  If  there  is  an  existential  proposition  h  such  that  b/<lhk  =  0 
and  b/hk  +  0,  A  is,  relative  to  k,  a  possible  necessary  cause  of  B. 

(viii.)  If  b/ahkl  =  1,  b/hk  *  1,  and  h/akl^O,  A  is,  relative  to  k, 
a  possible  sufficient  cause  of  B  under  conditions  I. 

(ix.)  If  bjahkl  •---$,  b/hkl^O,  h/dkl*Q,  and  h/akl*0,  A  is, 
relative  to  k,  a  possible  necessary  cause  of  B  under  conditions  I. 


Die  Pri/icipien  dcr  Wahracheinlichkeitarechnung,  p.  86. 


NOTES  INDUCTION  AND  ANALOGY  277 

Thus  an  event  is  a  possible  necessary  cause  of  another,  relative  to 
given  nomologic  data,  if  circumstances  can  arise,  not  inconsistent 
with  our  existential  data,  in  which  the  first  event  will  be  indispensable 
if  the  second  is  to  occur. 

(x.)  Two  events  are  causally  independent  if  no  part  of  either  is, 
relative  to  our  nomologic  data,  a  possible  cause  of  any  part  of  the 
other  under  the  conditions  of  our  existential  knowledge.  The  greater 
the  scope  of  our  existential  knowledge,  the  greater  is  the  likelihood 
of  our  being  able  to  pronounce  events  causally  dependent  or  inde 
pendent. 

4.  These  definitions  preserve  the  distinction  between  '  causallv 
independent  '  and  '  independent  for  probability,' — the  distinction 
between  causa  essetidi  and  causa  cognoscendi.  If  b/ahkl  ^b/<ihkl, 
where  a  and  b  may  be  any  propositions  whatever  and  are  not  limited 
as  they  were  in  the  causal  definitions,  we  have  'dependence  for 
probability,'  and  a  is  a  causa  c<><j»oscemU  for  b,  relative  to  data  Id. 
If  a  and  b  arc  causally  dependent,  according  to  definition  (x.),  6  is  a 
possible  causa  essendi,  relative  to  data  kl. 

But,  after  all,  the  essential  relation  is  that  of  '  independence  for 
probability.'  We  wish  to  know  whether  knowledge  of  one  fact 
throws  light  of  any  kind  upon  the  likelihood  of  another.  The  theory 
of  causality  is  only  important  because  it  is  thought  that  by  means  of 
its  assumptions  light  can  b<>  thrown  by  the  experience  of  one  pheno 
menon  upon  the  expectation  of  another. 


PART   IV 

SOME  PHILOSOPHICAL  APPLICATIONS  OF 
PROBABILITY 
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CHAPTER   XXIV 

THE    MEANINGS    OF    OBJECTIVE    CHANCE.    AND    OF    RANDOMNESS 

1.  MANY  important  differences  of  opinion  in  the  treatment  of 
Probability  have  been  duo  to  confusion  or  vagueness  as  to 
what  is  meant  by  Randomness  and  by  Objective  Chance,  as 
distinguished  from  what,  for  the  purposes  of  this  chapter,  may  be 
termed  Subjective  Probability,  ft  is  agreed  tli.it  there  is  a  sort 
of  Probability  which  depends  upon  knowledge  and  ignorance,  and 
is  relative,  in  some  manner,  to  the  mind  of  the  subject ;  but  it  is 
supposed  that  there  is  also  a  more  objective  Probability  which 
is  not  thus  dependent,  or  less  completely  so.  though  precisely 
what  this  conception  stands  for  is  not  plain.  The  relation  of 
Randomness  to  the  other  concepts  is  also  obscure.  The  problem 
of  clearing  up  these  distinctions  is  of  importance  if  we  are  to 
criticise  certain  schools  of  opinion  intelligently,  as  well  as  to  the 
treatment  of  the  foundations  of  Statistical  Inference  which  is  to 
be  attempted  in  Part  V. 

There  are  at  least  three  distinct  issues  to  be  kept  apart.  There 
is  the  antithesis  between  knowledge  and  ignorance,  between 
events,  that  is  to  say,  which  we  have  some  reason  to  expect,  and 
events  which  we  have  no  reason  to  expect,  which  gives  rise  to 
the  theory  of  subjective  probability  and  subjective  chance  ;  and, 
connected  with  this,  the  distinction  between  '  random  '  selection 
and  *  biassed  '  selection.  There  are  next  objective  probability  and 
objective  chance,  which  are  as  yet  obscure,  but  which  are  com 
monly  h«'ld  to  arise  out  of  the  antithesis  between  'cause'  and 
4  chance,'  between  events,  that  is  to  say,  which  are  causally  con 
nected  and  events  which  an;  not  causally  connected.  And  there 
is,  lastly,  the  antithesis  between  chance  and  design,  between 
blind  causes  '  and  '  final  causes,'  where  we  oppose  a  *  chance 
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event  to  one,  part  of  whose  cause  is  a  volition  following  on  a 
conscious  desire  for  the  event.1 

2.  The  method  of  this  treatise  has  been  to  regard  subjective 
probability  as  fundamental  and  to  treat  all  other  relevant  con 
ceptions  as  derivative  from  this.  That  there  is  such  a  thing  as 
probability  in  this  sense  has  been  admitted  by  all  sensible  philo 
sophers  since  the  middle  of  the  eighteenth  century  at  least.2  But 
there  is  also,  many  writers  have  supposed,  something  else  which 
may  be  fitly  described  as  objective  probability  ;  and  there  is, 
besides,  a  long  tradition  in  favour  of  the  view  that  it  is  this  (what 
ever  it  may  be)  which  is  logically  and  philosophically  important, 
subjective  probability  being  a  vague  and  mainly  psychological 
conception  about  which  there  is  very  little  to  be  said. 

The  distinction  exists  already  in  Hume  :  "  Probability  is  of 
two  kinds,  either  when  the  object  is  really  in  itself  uncertain, 
and  to  be  determined  by  chance  ;  or  when,  though  the  object  be 
already  certain,  yet  'tis  uncertain  to  our  judgment,  which  finds 
a  number  of  proofs  on  each  side  of  the  question."  3  But  the 
distinction  is  not  elucidated,  and  one  can  only  infer  from  other 
passages  that  Hume  did  not  intend  to  imply  in  this  passage  the 
existence  of  objective  chance  in  a  sense  contradictory  to  a  deter- 
minist  theory  of  the  Universe.  In  Condorcet  all  is  confused  ;  and 
in  Laplace  nearly  all.  In  the  nineteenth  century  the  distinction 
begins  to  grow  explicit  in  the  writings  of  Cournot.  "  Les  explica 
tions  que  j'ai  donnees  .  .  .  ,"  he  writes  in  the  preface  to  his 
Exposition,  "  sur  le  double  sens  du  mot  de  probabilite,  qui 
tantot  se  rapporte  a  unc  certaine  mesure  de  nos  connaissances,  et 
tantot  a  une  mesure  de  la  possibilite  des  choses,  independamment 
de  la  connaissance  que  nous  en  avons  :  ces  explications,  dis-je, 
me  semblent  propres  a  resoudre  les  difficult  es  qui  ont  rendu 
jusqu'ici  suspecte  a  de  bons  esprits  toute  la  theorie  de  la  proba 
bilite  mathematique."  It  will  be  worth  while  to  pause  for  a 
moment  to  consider  the  ideas  of  Cournot. 

1  This  is  discussed  in  Chapter  XXV.  §  4. 

2  D'Alembert,  collecting  (largely  from  Hume,  many  passages  being  trans 
lated  almost   verbatim)  in    the  Encyclopedic  methodique   the   most   up-to-date 
commonplaces  of  the  subject,  found  it  natural  to  write  :    "  II  n'y  a  point  de 
hasard  a  proprement  parler  ;   mais  il  y  a  son  equivalent  :  1'ignorance,  oti  nous 
sommes  des  vraies  causes  des  evenemens,  a  sur  notre  esprit  I'influence  qu'on 
suppose  an  hasard."     Compare  also  the  sentences  from   Spinoza  quoted  on 
p.  117  above. 

}  A  Treatise  of  Human  Nature,  Book  ii.  part  iii.  section  ix. 
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3.  Cournot,  while  admitting  that  there  is  such  a  thing  as  sub 
jective  chance,  was  concerned  to  dispute  the  opinion  that  chance 
is  merely  the  offspring  of  ignorance,  saying  that  in  this  case 
"  le  calcul  des  chances  "  is  merely  "  un  calcul  des  illusions." 
The  chance,  upon  which  "  le  calcul  des  chances  "  is  based,  is 
something  different,  and  depends,  according  to  him.  on  the  com 
bination  or  convergence  of  phenomena  belonging  to  independent 
series.  By  "  independent  series  "  he  means  series  of  phenomena 
which  develop  as  parallel  or  successive  series  without  any  causal 
interdependence  or  link  of  solidarity  whatever.1  No  one,  he 
says  by  way  of  example,  seriously  believes  that  in  striking  the 
ground  with  his  foot  he  puts  out  the  navigator  in  the  Antipodes, 
or  disturbs  the  system  of  Jupiter's  satellites.  Separate  trains  of 
events,  that  is  to  say.  have  been  set  going  by  distinct  initial  acts  of 
creation,  so  to  speak.'2  Kvery  event  is  causally  connected  with 
previous  events  belonging  to  its  own  series,  but  it  cannot  be 
modified  by  contact  with  events  belonging  to  a  different  series. 
A  *  chance  '  event  is  a  complex  due  to  the  concurrence  in  time 
or  place  of  events  belonging  to  causally  independent  series. 

This  theory,  as  it  stands,  is  evidently  unsatisfactory.  Even 
if  there  are  series  of  phenomena  which  are  independent  in  Cournot's 
sense,  it  is  not  clear  how  we  can  know  which  they  are,  or  how  we 
can  set  up  a  calculus  which  presumes  an  acquaintance  with  them. 
Just  as  it  is  likely  that  we  are  all  cousins  if  we  go  back  far  enough, 
so  there  may  be,  after  all,  remote  relationships  between  ourselves 
and  Jupiter.  A  remote  connection  or  a  reaction  quantitatively 
small  is  a  matter  of  degree  and  not  by  any  means  the  same  thing 
as  absolute  independence.  Nevertheless  Cournot  has  contri 
buted  something,  1  think,  to  the.  stock  of  our  ideas.  He  has 

1  "  he  nidi   hasard."  Cournot  writes  in  his  K.v*ui  *ur  ItM  fondfmrnt*  tir  «"•*• 
rniiiiriiKHfuicrx,  "  ti'indique  pas  une  <  ause  Huhslaiit iello,  main  une  id<V  :   eette  idl 
est  (die  (!<•  la  <  omhinaison  entre  plusieurs  serifs  de  causes  on  de  faits  qui  se 
devfloppent  ehaeun  daiiH  sa  H-rie  propre.  independainment  les  uns  des  ant  res." 
Thin  in  very  like  the  definition  ^ive-n  l>y  .lean  de  la  l'la<  ett<*  in  his  Traitf  dr*  jnis 
(Ir  hiiHiird.  1<>  which  ('..mnot  refer.-*  :    "  I'our  nn>i,  j<-  suis  |M-rnuade  que  le  liasaid 
n-riferrne  qiielquc  (  h'ise  <|e  real  et  de  posit  if,  HHVoir  un  conconrs  de  deux  cm 
plti.sic'urs  «-\  ('-nements  <-ont  indents,  <  hactin  deKcpieln  n  nen  eiinwrt,  niais  «-n  Horte 
qui-  li-ur  c-c.nrours  n'eii  a  aueune  ,,„.-  I'mi  enrmaihKe." 

2  KHMII   «<ir  l<*  f»nth -inm t*  di    jm*  connaiatuince*,  i.    I'M  :     "  K.i  nature  lie  «e 
gouverne  pas  j>ar  une  lui  unique*   .   .   .   son  lois  lie  sont    pan  touten  derivec-j*  IC-H 
un<>8  des  Htitres,  <nj  clerivees  toutes  d'une  K)i  su|M'-ri«'iir«  pnr  une  net-«-sj<ite  pun*- 
rnont  lo^icjuc-  .   .   .   nous  devons  l«-s  eoneevoir  nu  contrairu  c-oinnu*  ayant    pu 
fetre  decreti't-H  m'-pareim-nt  d'une  infinite  de  niuniorc-H." 
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hinted  at,  even  if  he  has  not  disentangled,  one  of  the  elements 
in  a  common  conception  of  chance  ;  and  of  the  notion,  which  he 
seems  to  have  in  his  mind,  we  must  in  due  course  take  account.1 

4.  In  the  writings  of  Condorcet,  I  have  said  above,  all  is  con 
fused.     But  in  Bertrand's  criticism  of  him  a  relevant  distinction, 
though   not   elucidated,    is   brought   before   the    mind.     "  The 
motives  for  believing,"  wrote  Condorcet,  "  that,  from  ten  million 
white  balls  mixed  with  one  black,  it  will  not  be  the  black  ball 
which  I  shall  draw  at  the  first  attempt  is  of  the  same  kind  as  the 
motive  for  believing  that  the  sun  will  not  fail  to  rise  to-morrow." 
'*  The  assimilation  of  the  two  cases,"  Bertrand  writes  in  criticism 
of  the  above,2  "  is  not  legitimate  :    one  of  the  probabilities  is 
objective,   the   other  subjective.     The   probability   of  drawing 
the  black  ball  at  the  first  attempt  is  Ywffijdoo>  neither  more  nor 
less.     Whoever  evaluates  it  otherwise  makes  a  mistake.     The 
probability  that  the  sun  will  rise  varies  from  one  mind  to  another. 
A  scientist  might  hold  on  the  basis  of  a  false  theory,  without  being 
utterly  irrational,  that  the  sun  will  soon  be  extinguished  ;    he 
would  be  within  his  rights,  just  as  Condorcet  is  within  his  ;  both 
would  exceed  their  rights  in  accusing  of  error  those  who  think 
differently."     Before  commenting  on  this  distinction,  let  us  have 
before  us  also  some  interesting  passages  by  Poincare. 

5.  We  certainly  do  not  use  the  term  *  chance,'  Poincare  points 
out,  as  the  ancients  used  it,  in  opposition  to  determinism.     For 
us  therefore  the  natural  interpretation  of  '  chance  '  is  subjective, 

•"  Chance  is  only  the  measure  of  our  ignorance.     Fortuitous 
phenomena  are,  by  definition,  those,  of  the  laws  of  which  we  are 

1  Cournot's  work  on  Probability  has  been  highly  praised  by  authorities  as 
diverse  and  distinguished  as  Boole  and  Von  Kries,  and  has  been  made  the 
foundation  of  a  school  by  some  recent  French  philosophers  (see  the  special 
number  of  the  Revue  de  meta physique  et  de  morale,  devoted  to  Cournot  and  pub 
lished  in  1905,  and  the  bibliography  at  the  end  of  the  present  volume  passim). 
The  best  account  with  which  I  am  acquainted,  of  Cournot's  theory  of  probability, 
is  to  be  found  in  A.  Darbon's  Le  Concept  du  hasnrd.     Cournot's  philosophy  of 
the  subject  is  developed,  not  so  much  in  his  Exposition  de  la  thcorie  des  chances, 
as  in  later  works,  especially  in  his  Essai  sur  les  fondements  de  nos  connaissances. 
Cournot  never  touched  any  subject  without  contributing  something  to  it,  but, 
on  the  whole,  his  work  on  Probability  is,  in  my  opinion,  disappointing.     No 
doubt  his  Exposition  is  superior  to  other  French  text-books  of  tho  period,  of 
which  there  is  so  large  a  variety,  and  his  work,  both  here  and  elsewhere,  is  not 
without  illuminating  ideas  :  but  the  philosophical  treatment  is  so  confused  and 
indefinite  that  it  is  difficult  to  make^much  of  it  beyond  the  one  specific  point 
treated  above. 

2  Calcul  des  probability,  p.  xix. 
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ignorant."  But  Poincare  immediately  adds  :  "  Is  this  definition 
very  satisfactory  ?  When  the  first  Chaldaean  shepherds  followed 
with  their  eyes  the  movements  of  the  stars,  they  did  not  yet 
know  the  laws  of  astronomy,  but  would  they  have  dreamed  of 
saying  that  the  stars  move  by  chance  ?  If  a  modern  physicist 
is  studying  a  new  phenomenon,  and  if  he  discovers  its  law  on 
Tuesday,  would  he  have  said  on  Monday  that  the  phenomenon 
was  fortuitous  ?  "  l 

There  is  also  another  type  of  case  in  which  "  chance  must  be 
something  more  than  the  name  we  give  to  our  ignorance."  Amon<r 
the  phenomena,  of  the  causes  of  which  we  are  ignorant,  there  are 
some,  such  as  those  dealt  with  by  the  manager  of  a  life  insurance 
company,  about  which  the  calculus  of  probabilities  can  give  real 
information.  Surely  it  cannot  be  thanks  to  our  ignorance, 
Poincare  urges,  that  we  are  able  to  arrive  at  valuable  conclusions. 
If  it  were,  it  would  be  necessary  to  answer  an  inquirer  thus  : 
"  You  ask  me  to  predict  the  phenomena  that  will  be  produced. 
If  I  had  the  misfortune  to  know  the  laws  of  these  phenomena,  I 
could  not  succeed  except  by  inextricable  calculations,  and  I  should 
have  to  give  up  the  attempt  to  answer  you  ;  but  since  I  am 
fortunate  enough  to  be  ignorant  of  them,  I  will  give  you  an  answer 
at  once.  And,  what  is  more  extraordinary  still,  my  answer  will 
be  right."  The  ignorance  of  the  manager  of  the  life  insurance 
company  as  to  the  prospects  of  life  of  his  individual  policy- 
holders  does  not  pro  vent  his  being  able  to  pay  dividends  to  his 
shareholders. 

Both  these  distinctions  seem  to  be  real  ones,  and  Poincare 
proceeds  to  examine  further  instances  in  which  we  seem  to 
distinguish  objectively  between  events  according  as  they  are  or 
are  not  due  to  '  chance.'  He  takes  the  case  of  a  cone  balanced 
upon  its  tip  ;  we  know  for  certain  that  it  will  fall,  but  not  on 
which  side — chance  will  determine.  "  A  very  small  cause,  which 
escapes  our  notice  determines  a  considerable  effect  that  we  cannot 
fail  to  see,  and  then  we  say  that  that  effect  is  due  to  chance." 
The  weather,  and  the  distribution  of  the  minor  planets  on  the 
Zodiac,  are  analogous  instances.  And  what  we,  term  '  games  of 
chance  '  afford,  it  has  always  been  recognised,  an  almost  perfect 

1  Calcul  dej)  prolxihiliteA  (2nd  edition),  p.  2.  This  passage  also  apjM-arH  in  an 
article  in  tin-  Itf.ntf  flu  main  for  1907  and  in  the  author's  Science  tt  mrthtidr,  of 
the  English  translation  of  whirh  I  have  made  use  above,  at  the  cost  uf  doing 
incomplete  justice  to  1'uincare'a  moat  admirable  style. 
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example.  "  It  may  happen  that  small  differences  in  the  initial 
conditions  produce  very  great  ones  in  the  final  phenomena.  A 
small  error  in  the  former  will  produce  an  enormous  error  in  the 
latter.  Prediction  becomes  impossible,  and  we  have  the  fortuit 
ous  phenomenon."  "  The  greatest  chance  is  the  birth  of  a  great 
man.  It  is  only  by  chance  that  the  meeting  occurs  of  two  genital 
cells  of  different  sex  that  contain  precisely,  each  on  its  side,  the 
mysterious  elements,  the  mutual  reaction  of  which  is  destined 
to  produce  genius.  .  .  .  How  little  it  would  have  taken  to  make 
the  spermatozoid  which  carried  them  deviate  from  its  course. 
It  would  have  been  enough  to  deflect  it  a  hundredth  part  of  an 
inch,  and  Napoleon  would  not  have  been  born  and  the  destinies 
of  a  continent  changed.  No  example  can  give  a  better  compre 
hension  of  the  true  character  of  chance." 

Poincare  calls  attention  next  to  another  class  of  events,  which 
we  commonly  assign  to  '  chance,'  the  distinguishing  characteristic 
of  which  seems  to  be  that  their  causes  are  very  numerous  and 
complex, — the  motions  of  molecules  of  gas,  the  distribution  of 
drops  of  rain,  the  shuffling  of  a  pack  of  cards,  or  the  errors  of 
observation.  Thirdly  there  is  the  type,  usually  connected  with 
one  of  the  first  two,  and  specially  emphasised,  as  we  have  seen 
above,  by  Cournot,  in  which  something  comes  about  through 
the  concurrence  of  events  which  we  regard  as  belonging  to  distinct 
causal  trains,— a  man  is  walking  along  the  street  and  is  killed  by 
the  fall  of  a  tile. 

6.  When  we  attribute  such  events,  as  those  illustrated  by 
Poincare,  to  chance,  we  certainly  do  not  mean  merely  to  assert 
that  we  do  not  know  how  they  arose  or  that  we  had  no  special 
reason  for  anticipating  them  a  priori.  So  far  from  this  being  the 
case,  we  mean  to  make  a  definite  assertion  as  to  the  kind  of  way 
in  which  they  arose  ; — though  exactly  what  we  mean  to  assert 
about  them  it  is  extremely  difficult  to  say. 

Now  a  careful  examination  of  all  the  cases  in  which  various 
writers  claim  to  detect  the  presence  of  '  objective  chance  '  con 
firms  the  view  that  '  subjective  chance/  which  is  concerned  with 
knowledge  and  ignorance,  is  fundamental,  and  that  so-called 
*  objective  chance,'  however  important  it  may  turn  out  to  be 
from  the  practical. or  scientific  point  of  view,  is  really  a  special 
kind  of  '  subjective  chance  '  and  a  derivative  type  of  the  latter. 
For  none  of  the  adherents  of  '  objective  chance  '  wish  to  question 
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the  determinist  character  of  natural  order  ;  and  the  possibility 
of  this  objective  chance  of  theirs  seems  always  to  depend  on  the 
possibility  that  a  particular  kind  of  knowledge  either  LJ  ours  or 
is  within  our  powers  and  capacity.  Let  me  try  to  distinguish  as 
exactly  as  I  can  the  criterion  of  objective  chance. 

7.  When  we  say  that  an  event  has  happened  by  chance,  we 
do  not  mean  that  previous  to  its  occurrence  the  event  was,  on 
the  available  evidence,  very  improbable  ;  this  may  or  may  not 
have  been  the  case.  We  say,  for  example,  that  if  a  coin  falls  heads 
it  is  *  by  chance/  whereas  its  falling  heads  is  not  at  all  improbable. 
The  term  *  by  chance  '  has  reference  rather  to  the  state  of  our 
information  about  the  concurrence  of  the  event  considered  and 
the  event  premised.  The  fall  of  the  coin  is  a  chance  event  if 
our  knowledge  of  the  circumstances  of  the  throw  is  irrelevant 
to  our  expectation  of  the  possible  alternative  results.  If  the 
number  of  alternatives  is  very  large,  then  the  occurrence  of 
the  event  is  not  only  subject  to  chance  but  is  also  very  im 
probable.  In  general  two  events  may  be  said  to  have  a  chance 
connection,  in  the  subjective  sense,  when  knowledge  of  the 
first  is  irrelevant  to  our  expectation  of  the  second,  and  produces 
no  additional  presumption  for  or  against  it;  when,  that  is  to 
say,  the  probabilities  of  the  propositions  asserting  them  are 
indejxndvnt  in  the  sense  defined  in  Chapter  XII.  §  8. 

The  above  definition  deals  with  chance  in  the  widest  sense. 
What  is  the  differentia  of  the  narrower  group  of  cases  to  which 
it  is  desired  to  apply  the  term  '  objective  chance  '  ?  The  occur 
rence  of  an  event  may  be  said  to  be  subject  to  objective  chance, 
I  think,  when  it  is  not  only  a  chance  event  in  the  above  sense, 
but  when  we  also  have  good  reason  to  suppose  that  the  addition 
of  further  knowledge  of  a  given  kind,  if  it  were  procurable,  would 
not  atlect  its  chance  character.  We  must  consider,  that  is  <o  say, 
the  probability  which  is  relative  not  to  actual  knowledge  but  to 
the  whole  of  a  certain  kind  of  knowledge.  We  may  be  able  to 
infer  from  our  evidence  that,  even  with  certain  kinds  of 
additions  to  our  knowledge,  the  connections  between  the  events 
would  still  be  subject  to  chance  in  the  sense  just  defined,  and 
we  may  be  able  to  infer  this  without  actually  having  the  addi 
tional  information  in  question.  If,  however  complete  our 
knowledge  of  certain  kinds  of  things  might  be,  there  would  still 
exist  independence  l>etween  the  propositions,  the  conjunction 
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of  which  we  are  investigating,  then  we  may  say  there  is  an 
objective  sense  in  which  the  actual  conjunction  of  these  pro 
positions  is  due  to  chance. 

8.  This  is,  I  think,  the  right  line  of  inquiry.  It  remains  to 
decide,  what  kinds  of  information  must  be  irrelevant  to  the 
connection,  in  order  that  the  presence  of  objective  chance  may 
be  established. 

When  we  attribute  a  coincidence  to  objective  chance,  we 
mean  not  only  that  we  do  not  actually  know  a  law  of  connection, 
but,  speaking  roughly,  that  there  is  no  law  of  connection  to  be 
known.  And  when  we  say  that  the  occurrence  of  one  alterna 
tive  rather  than  another  is  due  to  chance,  we  mean  not  only 
that  we  know  no  principle  by  which  to  choose  between  the 
alternatives,  but  also  that  no  such  principle  is  knowable.  This 
use  of  the  term  closely  corresponds  to  what  Venn  means  by  the 
term  '  casual '  :  "  We  call  a  coincidence  casual,  I  apprehend, 
when  we  mean  to  imply  that  no  knowledge  of  one  of  the  two 
elements,  which  we  can  suppose  to  be  practically  attainable, 
would  enable  us  to  expect  the  other."  1 

To  make  this  more  precise,  we  must  revive  our  distinc 
tion,2  between  nomologic  knowledge  and  ontologic  knowledge, 
between  knowledge  of  laws  and  knowledge  of  facts  or  existence. 
Given  certain  facts/(#)  about  a  and  certain  laws  of  connection,  L, 
we  can  infer  certainly  or  probably  other  facts  <£(a)  about  a.  If 
a  complete  knowledge  of  laws  of  connection  together  with /(a) 
yields  no  appreciable  probability  for  preferring  <£(a)  to  other 
alternatives,  then  I  suggest  that  an  actual  connection  between  </> 
and/  in  a  particular  instance  may  be  said  to  be  due  to  chance  in 
a  sense  which  usage  justifies  us  in  calling  objective.  We  do 
not,  in  fact,  when  we  speak  of  objective  chance,  always  use  it 
in  so  strict  a  sense  as  this,  but  this  is,  I  think,  the  underlying 
conception  to  which  current  usage  approximates.  Current 
usage  diverges  from  this  sense  mainly  for  two  reasons.  We 
speak  of  objective  chance  if  in  the  above  conditions  our 
grounds  for  preference,  though  appreciable,  are  very  small ;  and 
we  are  not  insistent  to  assert  the  rule  of  chance  if  a  comparatively 
slight  addition  to  our  ontologic  knowledge  would  render  the 
probability  or  the  grounds  for  preference  appreciable. 

1  Logic  of  Chance,  p.  245. 
a  See  Part  III.  Note  (ii.)  ;j  2,  p.  275. 


CH.  xxiv          PHILOSOPHICAL  APPLICATIONS  289 

To  sum  up  the  above,  an  event  is  due  to  objective  chance  if 
in  order  to  predict  it,  or  to  prefer  it  to  alternatives,  at  present 
equi-probable,  with  any  high  decree  of  probability,  it  would  be 
necessary  to  know  a  great  many  more  facts  of  existence  about 
it  than  we  actually  do  know,  and  if  the  addition  of  a  wide 
knowledge  of  general  principles  would  be  little  use. 

It  must  be  add-d  that  we  make  a  distinction  between  facts  of 
existence  which  are  highly  variable  from  case  to  case  and  those 
which  are  constant  or  nearly  constant  over  a  certain  field  of 
observation  or  experience.  Within  the  limits  of  this  field  we 
regard  the  permanent  facts  of  existence  as  being,  from  the  stand 
point  of  chance,  in  nearly  the  same  position  as  laws.  A  connec 
tion  is  not  due  to  chance,  therefore,  if  a  knowledge  of  the  per 
manent  facts  of  existence  could  lead  to  their  prediction. 

To  sum  up  again  therefore,  -if  within  a  given  field  of  observa 
tion  or  experience  a  knowledge  of  those  facts  of  existence  which  are 
permanent  or  invariable  within  that  field,  together  with  a  know 
ledge  of  all  the  relevant  fundamental  causal  laws  or  general 
principles,  and  of  a  few  other  facts  of  existence,  would  not 
permit  us,  given/(a),  to  attribute  an  appreciable  probability  to 
<f>(a)  (or  an  appreciable  probability  to  the  alternative  ^(a) 
rather  than  <&2(a))  ;  then  the  conjunction  of  <£(a)  (or  of  (p^a) 
rather  than  <t>z(a)  with /('/))  is  due  to  objective  chance. 

9.  If  we  return  to  the  examples  of  Poincare,  the  above  defini 
tion  appears  to  conform  satisfactorily  with  the  usages  of  common 
sense.  It  is  when  an  exact  knowledge  of  fact,  as  distinguished 
from  principle,  is  required  for  even  approximate  prediction  that 
the  expression  '  objective  chance '  seems  applicable.  But 
neither  our  definition  nor  usage  is  precise  as  to  the  amount  of 
knowledge  of  fact  which  must  be  required  for  prediction,  in 
order  that,  in  the  absence  of  it,  the  event  may  be  regarded  as 
subject  to  objective  chance. 

It  may  be  added  that  the  expression  *  chance  '  can  be  used 
with  reference  to  general  statements  as  well  as  to  particular  facts. 
We  say,  for  example,  that  it  is  a  matter  of  chance  if  a  man  dies 
on  his  birthday,  meaning  that,  as  a  general  principle  and  in  tiie 
absence  of  special  information  bearing  on  a  particular  case,  there 
is  no  presumption  whatever  in  favour  of  his  dving  on  his  birthday 
rather  than  on  any  other  day.  If  as  a  general  rule  there  were  cele 
brations  on  such  a  day  such  as  would  be  not  unlikely  to  accelerate 
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death,  we  should  say  that  a  man's  dying  on  his  birthday  was  not 
altogether  a  matter  of  chance.  If  we  knew  no  such  general  rule 
but  did  not  know  enough  about  birthdays  to  be  assured  that  there 
was  no  such  rule,  we  could  not  call  the  chance  '  objective  '  ;  we 
could  only  speak  of  it  thus,  if  on  the  evidence  before  us  there  was  a 
strong  presumption  against  the  existence  of  any  such  general  rule. 

10.  The  philosophical  and  scientific  importance  of  objective 
chance  as  defined  above  cannot  be  made  plain,  until  Part  V.,  on 
the  Foundations  of  Statistical  Inference,  has  been  reached.     There 
it  will  appear  in  more  than  one  connection,  but  chiefly  in  connec 
tion  with  the  application  of  Bernoulli's  formula.     In  cases  where 
the  use  of  this  formula  is  valid,  important  inferences  can  be  drawn; 
and  it  will  be  shown  that,  when  the  conditions  for  objective  chance 
are  approximately  satisfied,  it  is  probable  that  the  conditions 
for  the  application  of  Bernoulli's  formula  will  be  approximately 
satisfied  also. 

11.  The  term  random  has  been  used,  it  is  well  recognised,  in 
several   distinct   senses.     Venn 1   and   other   adherents   of   the 
'  frequency  '  theory  have  given  to  it  a  precise  meaning,  but  one 
which  has  avowedly  very  little  relation  to  popular  usage.     A 
random  sample,  says  Peirce,2  is  one  "  taken  according  to  a  precept 
or  method,  which,  being  applied  over  and  over  again  indefinitely, 
would  in  the  long  run  result  in  the  drawing  of  any  one  set  of  in 
stances  as  often  as  any  other  set  of  the  same  number."     The 
same  fundamental  idea  has  been  expressed  with  greater  precision 
by  Professor  Edgeworth  in  connection  with  his  investigations 
into  the  law  of  error.3    It  is  a  fatal  objection,  in  my  opinion,  to 
this  mode  of  defining  randomness,  that  in  general  we  can  only 
know  whether  or  not  we  have  a  random  sample  when  our  know 
ledge  is  nearly  complete.     Its  divergence  from  ordinary  usage  is 
well  illustrated  by  the  fact  that  there  would  be  perfect  randomness 
in  the  distribution  of  stars  in  the  heavens,  as  Venn  explicitly  points 
out,  if  they  were  disposed  in  an  exact  and  symmetrical  pattern.4 

1  Logic  of  Chance,  chap,  v.,  "  The  Conception  Randomness  and  its  Scientific 
Treatment." 

2  "  A  Theory  of  Probable  Inference  "  (published  in  Johns  Hopkins  Studies  in 
Logic),  p.  152. 

'3  "Law  of  Error,"  Camb.  Phil.  Trans.,  1904,  p.  128. 

4  But  it  may  be  added  that  this  seems  inconsistent  with  Venn's  conception 
of  randomness  as  that  of  aggregate  order  and  individual  irregularity  ;  nor  is  it 
concordant  with  Venn's  typically  random  diagram  (p.  118).  His  usage,  there 
fore,  i=i  sometimes  nearer  than  his  definition  to  the  popular  usage. 
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I  do  not  believe,  therefore,  that  this  kind  of  definition  is  a 
useful  one.  The  term  must  be  defined  with  reference  to  prob 
ability,  not  to  what  will  happen  "  in  the  long  run  "  ;  though 
there  may  be  two  senses  of  it,  corresponding  to  subjective  and 
objective  probability  respectively. 

The  most  important  phrase  in  which  the  term  is  used  Is  that 
of  *  a  random  selection  '  or  '  taken  at  random.'  When  we  apply 
this  term  to  a  particular  member  of  a  series  or  collection  of 
objects,  we  may  mean  one  of  two  things.  We  may  mean  that 
our  knowledge  of  the  method  of  choosiiiu  the  particular  member 
is  such  that  d  priori  the  member  chosen  is  as  likelv  to  be  any 
one  meml-er  of  the  series  as  any  other.  We  may  also  mean, 
not  that  we  have  no  knowledge  as  to  which  particular  member 
is  in  question,  but  that  such  knowledge  as  we  have  respecting 
the  particular  member,  as  distinguished  from  other  members  of 
the  series,  is  irrelevant  to  the  question  as  to  whether  or  not 
this  member  has  the  characteristic  under  examination.  In  the 
first  case  the  particular  member  is  a  random  member  of  the 
series  for  all  characteristics  ;  in  the  second  case  it  is  a  random 
member  for  some  only.  As  the  second  case  is  the  more  general, 
we  had  better  take  that  for  the  purpose  of  defining  'random 
selection.' 

The  point  will  be  brought  out  further  if  we  discuss  the 
more  difficult  use  of  the  term.  What  exactly  do  we  mean  by 
the  statement :  "  Any  number,  taken  at  random,  is  equally 
likely  to  be  odd  or  even  "  ?  According  to  the  frequency  theory, 
this  simply  means  that  there  are  as  many  odd  numbers  as  there 
are  even.  Taking  it  in  a  sense  corresponding  to  subjective 
chance  (and  to  the  explanations  given  above),  I  propose  as 
a  definition  the  following :  a  is  taken  at  random  from  the 
class  S  for  the  purposes  of  the  prepositional  function  S(x)  .  $(x)y 
relative  to  evidence  //,  if  '  x  is  a  '  is  irrelevant  to  the  probability 
</>(z)  S(z) .  h.  Thus  '  the  number  of  the  inhabitants  of  France  is 
odd  '  is,  relative  to  my  knowledge,  a  random  instance  of  tin- 
prepositional  function  '  x  is  an  odd  number,'  since  '  a  IB  the 
number  of  the  inhabitants  of  France  '  is  irrelevant  to  the  prob 
ability  of  *  a  is  odd.'  l  Thus  to  say  that  a  number  taken  at 
random  is  as  likely  to  be  odd  as  even,  means  that  there  is  a 

1   In  tho  above  S(x)  atanda  for  '  z  in  a  numbvi,'  .;,ix)  Htands  fur  •  r  in  mil.' 
11  stands  for  -the  number  of  inhabitants  of  France.' 
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probability  |  that  any  instance  taken  at  random  of  the 
generalisation  '  all  numbers  are  odd  '  (or  of  the  corresponding 
generalisation  '  all  numbers  are  even  ')  is  true  ;  an  instance  being 
taken  at  random  in  respect  of  evenness  or  oddness,  if  our 
knowledge  about  it  satisfies  the  conditions  defined  above. 
Whether  or  not  a  given  instance  is  taken  at  random,  depends, 
therefore,  upon  what  generalisation  is  in  question. 

12.  We  may  or  may  not  have  reason  to  believe  that,  if  we  take 
a  series  of  random  selections,  the  proportionate  number  of 
occurrences  of  one  particular  type  of  result  will  very  probably 
lie  within  certain  limits.  For  reasons  to  be  explained  in  Chapter 
XXIX.,  random  selection  relative  to  such  information  may 
conveniently  be  termed  '  random  selection  under  Bernoullian 
conditions.'  It  is  this  kind  of  random  selection  which  is  scientific 
ally  and  statistically  important.  But,  as  this  corresponds  to 
'  objective  chance,'  it  is  convenient  to  have  a  wider  definition 
of  '  random  selection  '  unqualified,  corresponding  to  '  subjective 
chance  '  ;  and  it  is  this  wider  definition  which  is  given  above. 

The  term  opposite  to  '  random  selection '  in  ordinary  usage 
is  '  biassed  selection.'  When  I  use  this  phrase  without  qualifica 
tion  I  shall  use  it  as  the  opposite  of  '  random  selection  '  in  the 
wider  unqualified  sense. 


CHAPTER   XXV 

SOME    PROBLEMS    ARISING   OUT   OF  THE    DISCUSSION    OF    CHANCE 

1.  THERE  are  two  classical  problems  in  which  attempts  have  been 
made  to  attribute  certain  astronomical  phenomena  to  a  specific 
cause,  rather  than  to  objective  chance  in  some  such  sense  as  has 
been  defined  in  the  preceding  chapter. 

The  first  of  these  is  concerned  with  the  inclinations  to  the 
ecliptic  of  the  orbits  of  the  planets  of  the  solar  system.  This 
problem  has  a  long  history,  but  it  will  be  sufficient  to  take  De 
Morgan's  statement  of  it.1  If  we  suppose  that  each  of  the  orbits 
might  have  any  inclination,  we  obtain  a  vast  number  of  combina 
tions  of  which  only  a  small  number  are  such  that  their  sum  is  as 
small  or  smaller  than  the  sum  of  those  of  the  actual  system. 
But  the  very  existence  of  ourselves  and  our  world  can  be  shown 
to  imply  that  one  of  this  small  number  has  been  selected,  and 
De  Morgan  derives  from  this  an  enormous  presumption  that 
"  there  was  a  necessary  cause  in  the  formation  of  the  solar  system 
for  the  inclinations  being  what  they  are." 

The  answer  to  this  was  pointed  out  by  D'Alembert  -  in  criticis- 

1  Article  on  I'robafnlitiei  in  Encyclujia<.di<i  Mflrojtulitana,  p.  412,  ;j  4«i.  DC 
Morgan  takes  this  without  acknowledgment  from  Laplace,  Theorir  (iinilt/liijiu- 
d':s  proltnbilitc*  (1st  edition),  pp.  257,  258.  Laplace  also  allows  for  the  fact 
that  all  the  planets  move  in  the  same  sense  as  the  earth.  He  concludes  :  "  <  >n 
verra  que  I'existonco  d'une  eauflc  commune  qui  a  dirige  tons  ces  niouvcnicns 
d.kiis  le  .sens  de  U  rotation  du  soleil,  et  sur  dcs  plans  JH%U  inclines  a  celui  de  son 
equateur,  est  indiquce  avec  une  probability  bicn  supcrieuio  a  cello  du  plus 
grand  nombro  des  fails  historicjucs  sur  les(juels  on  ne  se  j>ermet  aucun  doute." 
Laplace  had  in  his  turn  borrowed  the  example,  also  without  acknowledgment, 
from  lUniel  Bernoulli.  See  also  D'Aleml>ert,  (Jjnmculen  mul/ninutii/nr.*,  vol.  iv., 
17t>8,  pp.  8(.»  and  2!»2. 

1  Op.  cil.  p.  21)2.  "  11  y  a  certuinement  d'iniini  contre  tin  a  paricr  (jue  ICH 
PlaneU-s  ne  devraient  {>as  se  trouver  dans  le  m6me  plan  ;  ce  n'«-st  pas  une  raihon 
pour  en  conclure  que  cette  disposition,  si  die  avoit  lieu,  auroit  neceiwuirement 
d'autre  cause  (jtie  le  hasurd  ;  car  il  y  auroit  de  mdme  1'inlini  contre  un  4  juirier 
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ing  Daniel  Bernoulli.  De  Morgan  could  have  reached  a  similar 
result  whatever  the  configuration  might  have  happened  to  be. 
Any  arbitrary  disposition  over  the  celestial  sphere  is  vastly 
improbable  d  priori,  that  is  to  say  in  the  absence  of  known  laws 
tending  to  favour  particular  arrangements.  It  does  not  follow 
from  this,  as  De  Morgan  argues,  that  any  actual  disposition 
possesses  d  posteriori  a  peculiar  significance. 

2.  The  second  of  these  problems  is  known  as  Michell's  problem 
of  binary  stars.  Michell's  Memoir  was  published  in  the  Philo 
sophical  Transactions  for  1767.1  It  deals  with  the  question  as  to 
whether  stars  which  are  optically  double,  i.e.  which  are  so  situated 
as  to  appear  close  together  to  an  observer  on  the  earth — are  also 
physically  so  "  either  by  an  original  act  of  the  Creator,  or  in  con 
sequence  of  some  general  law,  such  perhaps  as  gravity."  He 
argues  that  if  the  stars  "  were  scattered  by  mere  chance  as  it 
might  happen  ...  it  is  manifest  .  .  .  that  every  star  being 
as  likely  to  be  in  any  one  situation  as  another,  the  probability  that 
any  one  particular  star  should  happen  to  be  within  a  certain 
distance  (as,  for  example,  one  degree)  of  any  other  given  star 
would  be  represented  .  .  .  by  a  fraction  whose  numerator  would 
be  to  its  denominator  as  a  circle  of  one  degree  radius  to  a  circle 
whose  radius  is  the  diameter  of  a  great  circle  .  .  .  that  is,  about 
1  in  13131."  From  this  beginning  he  derives  an  immense  pre 
sumption  against  the  scattering  of  the  several  contiguous  stars 
that  may  be  observed  "  by  mere  chance  as  it  might  happen." 
And  he  goes  on  to  argue  that,  if  there  are  causal  laws  directly 
tending  to  produce  the  observed  proximities,  we  may  reasonably 
suppose  that  the  proximities  are  actual,  and  not  merely  optical 
and  apparent.  The  fact  that  Michell's  induction  was  confirmed 
by  the  later  investigations  of  Herschell  adds  interest  to  the 
speculation.  But  apart  from  this  the  argument  is  evidently 


qne  les  Planetes  pourroient  n'avoir  pas  unc  certaine  disposition  determinee  & 
volonte.  .  .  ." 

D'Alembert  is  employing  the  instance  for  his  own  purposes,  in  order  to  build 
up  an  ad  hominem  argument  in  favour  of  his  theory  concerning  '  runs  '  against 
D.  Bernoulli  (see  also  p.  317). 

1  See  also  Todhunter's  History,  pp.  332-4  ;  Venn,  Logic  of  Chance,  p.  260  ; 
Forbes,  "  On  the  Alleged  Evidence  for  a  Physical  Connexion  between  Stars 
forming  Binary  or  Multiple  Groups,  deduced  from  the  Doctrine  of  Chances," 
Phil.  Mag.,  1850,  and  Boole,  "  On  the  Theory  of  Probabilities  and  in  par 
ticular  on  Micholi's  Problem  of  the  Distribution  of  the  Fixed  Stars,"  Phil. 
May.,  1851. 
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subtler  than  in  the  first  example.  Michell  argues  that  there  are 
more  stars  optically  contiguous,  than  would  be  likely  if  there 
were  no  special  cause  acting  towards  this  end,  and  further  that, 
if  such  a  cause  is  in  operation,  it  must  be  real,  and  not  merely 
optical,  contiguity  that  results  from  it. 

Let  us  analyse  the  argument  more  closely.  By  4i  mere  chance 
as  it  might  happen  "  Michell  cannot  be  supposed  to  mean  "  un 
caused."  He  is  thinking  of  objective  chance  in  the  sense  in 
which  I  have  defined  this  in  the  preceding  chapter.  We 
speak  of  a  chance  occurrence  when  it  is  brought  about  by  the 
coincidence  of  forces  and  circumstances  so  numerous  and  complex 
that  knowledge  sufficient  for  its  prediction  is  of  a  kind  altogether 
out  of  our  reach.  Michell  uses  the  term  vaguely  but  means,  I 
think,  something  of  this  kind  :  An  event  is  due  to  mere  chance 
when  it  can  only  occur  if  a  large  number  of  independent l  con 
ditions  are  fulfilled  simultaneously.  The  alternatives  which 
Michell  is  discussing  are  therefore  these  :  Are  binary  stars  merely 
due  to  the  interaction  of  a  vast  variety  of  stellar  laws  and  posi 
tions  or  are  they  the  result  of  a  few  fundamental  tendencies, 
which  might  be  the  subject  of  knowledge  and  which  would  lead 
us  to  expect  such  stars  in  relative  profusion  ? 

The  existence  of  numerous  binary  stars  may  give  a  real 
inductive  argument  in  favour  of  their  arising  out  of  the  inter 
action  of  a  relatively  small  number  of  independent  causes.  But 
it  is  not  possible  to  arrive  at  such  precise  results  as  Michell's. 
If  there  is  some  finite  probability  d  priori  that  binary  stars, 
when  they  arise,  do  arise  in  this  way,  then,  since  the  frequent 
coincidence  of  a  given  set  of  independent  causes  relatively  few 
in  number  is  more  likely  than  that  of  a  set  relatively  numerous, 
the  observation  of  binary  stars  will  raise  this  probability  d  pos 
teriori  to  an  extent  which  depends  upon  the  relative  profusion 
in  which  such  stars  appear.  If,  in  short,  the  first  of  the  two 
alternatives  proposed  above  Is  assumed,  there  is  no  greater 
presumption  for  a  distribution,  covering  a  part  of  the  heavens, 
in  which  binary  stars  appear,  than  for  any  other  distribution  ; 
if  the  second  IB  assumed,  there  is  a  greater  presumption.  Tin- 
observation  of  numerous  distributions  in  which  binary  stars 
appear  increases,  therefore,  by  the  inverse  principle,  any  d  priori 
probability  which  may  exist  in  favour  of  the  second  hypothesis. 

1  S«-o  §  ;{  of  Note  (ii.)  to  Part  III. 
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But  more  than  this  the  argument  cannot  justify.  That  Michell's 
argument  is,  as  it  stands,  no  more  valid  than  De  Morgan's, 
becomes  plain  when  we  notice  that  he  would  still  have  a  high 
probability  for  his  conclusion  even  if  only  one  binary  star  had 
been  observed.  The  valuable  part  of  the  argument  must  clearly 
turn  upon  the  observation  of  numerous  binary  stars. 

Let  us  now  turn  to  Michell's  second  step.  He  argues  that, 
if  binary  stars  arise  out  of  the  interaction  of  a  small  number  of 
independent  forces,  they  must  be  physically  and  not  merely 
optically  double.  The  force  of  this  argument  seems  to  depend 
upon  our  possessing  previous  knowledge  as  to  the  nature  of  the 
principal  natural  laws,  and  upon  an  assumption,  arising  out  of 
this,  that  there  are  not  likely  to  be  forces  tending  to  arrange 
stars,  in  reality  at  great  distances  from  one  another,  so  as  to 
appear  double  from  this  particular  planet.  But  Michell,  in 
arguing  thus,  was  neglecting  the  possibility  that  the  optical 
connection  between  the  stars  might  be  due  to  the  observer  and 
his  means  of  observation.  It  was  not  impossible  that  there  should 
be  a  law,  connected  with  the  transmission  of  light  for  example, 
which  would  cause  stars  to  appear  to  an  observer  to  be  much 
nearer  together  than  they  really  are. 

While,  therefore,  a  relative  profusion  of  binary  stars  constitutes 
evidence  favourably  relevant  to  Michell's  conclusion,  the  argu 
ment  is  more  complex  and  much  less  conclusive  than  he  seems  to 
have  supposed.  This  is  a  criticism  which  is  applicable  to  many 
such  arguments.  The  simplicity  of  the  evidence,  which  arises 
out  of  the  lack  of  much  relevant  information,  is  liable,  unless  we 
are  careful,  to  lead  us  into  deceptive  calculations  and  into  asser 
tions  of  high  numerical  probabilities,  upon  which  we  should  never 
venture  in  cases  where  the  evidence  is  full  and  complicated,  but 
where,  in  fact,  the  conclusion  is  established  far  more  strongly. 
The  enormously  high  probability  in  favour  of  his  conclusion,  to 
which  Michell's  calculations  led  him,  should  itself  have  caused 
him  to  suspect  the  accuracy  of  the  reasoning  by  which  he 
reached  it. 

3.  Some  more  recent  problems  of  this  type  seem,  however,  so 
far  as  I  am  acquainted  with  them,  to  follow  safer  lines  of  argu 
ment.  The  most  important  are  concerned  with  the  existence 
of  star  drifts.  It  seems  to  me  not  at  all  impossible  to  possess 
data  on  which  a  valid  argument  can  be  constructed  from  the 


OH.  xxv  PHILOSOPHICAL  APPLICATIONS  297 

observation  of  optically  apparent  star  drifts  to  the  probability 
of  a  real  uniformity  of  motion  amongst  certain  sets  of  stars 
relatively  to  others. 

Another  problem,  somewhat  analogous  to  the  preceding,  has 
been  recently  discussed  by  Professor  Karl  Pearson.1  The  title 
might  prove  a  little  misleading,  perhaps,  until  the  explanation 
has  been  reached  of  the  sense  in  which  the  term  '  random  '  is 
used  in  it.  But  Professor  Pearson  uses  the  term  in  a  perfectly 
precise  sense.  He  defines  a  random  distribution  as  one  in  which 
spherical  shells  of  equal  volume  about  the  sun  as  centre  contain 
the  same  number  of  stars.2  He  argues  that  the.  observed  facts 
render  probable  the  following  disjunction  :  Either  the  distribu 
tion  of  stars  is  not  random  in  the  sense  defined  above,  or  there  is 
a  correlation  between  their  distance  and  their  brilliancy,  such  as 
might  be  produced,  for  example,  by  the  absorption  of  light  in  its 
transmission  through  space,  or  the  space  within  which  they  all 
lie  is  limited  in  volume  and  not  spherical  in  form.3  But  it  is 
useless  to  employ  the  term  randotn  in  this  sense  in  such  inquiries 
as  Michell's.  For  there  is  no  reason  to  suppose  that  a  non- 
random  distribution  is  more  likely  than  a  random  distribution 
to  depend  upon  the  interaction  of  a  small  number  of  independent 
forces,  and  there  might  even  exist  a  presumption  the  other  way. 
This  arbitrary  interpretation  of  randomness  does  not  help  us  to 
the  solution  of  any  interesting  problem. 

4.  The  discussion  of  final  causes  and  of  the  argument  from 
design  has  suffered  confusion  from  its  supposed  connection  with 
theology.  But  the  logical  problem  is  plain  and  can  be  determined 
upon  formal  and  abstract  considerations.  The  argument  is  in  all 
cases  simply  this — an  event  has  occurred  and  has  been  observed 
which  would  be  very  improbable  a  priori  if  we  did  not  know  that 
it  had  actually  happened  ;  on  the  other  hand,  the  event  is  of  such 
a  character  that  it  might  have  been  not  unreasonably  predicted 
if  we  had  assumed  the  existence  of  a  conscious  agent  whose 
motives  an;  of  a  certain  kind  and  whose  powers  are  sufficient. 

1  "  On  the  Improbability  of  a  Random  l)ihtriliut ion  of  the  Stars  in  Space," 
1'rorfnliiKjx  of  Itni/nl  ,S'ooY/y,  series  A,  vol.  H4,  pp.  -17-Tll.  1910. 

-  It  is,  therefore,  independent  of  direction,  and  the  distribution  in  random 
even  if  tin-  stars  arc  massed  in  j»urticular  quartern  of  th<-  hru\rn.s.      The  «l--lini- 
tion  is.  therefore,  exceedingly  arbitrary. 

*  Thin  should  run   more  correctly,   I   think,  "  not  a  sphere  trith  the  />nn  a* 
centre." 
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Symbolically  :  Let  h  be  our  original  data,  a  the  occurrence 
of  the  event,  b  the  existence  of  the  supposed  conscious  agent. 
Then  a/h  is  assumed  very  small  in  comparison  with  a/bh  ;  and 
we  require  b/aht  the  probability,  that  is  to  say,  of  6  after  a  is 
known.  The  inverse  principle  of  probability  already  demon- 

b/h 
strated  shows  that  b/ ah  =  a/bh.    ,,,  and  b/ah  is  therefore  not 

determinate  in  terms  of  a/bh  and  a/h  alone.  Thus  we  cannot 
measure  the  probability  of  the  conscious  agent's  existence  after 
the  event,  unless  we  can  measure  its  probability  before  the  event. 
And  it  is  our  ignorance  of  this,  as  a  rule,  that  we  are  endeavouring 
to  remedy.  The  argument  tells  us  that  the  existence  of  the 
hypothetical  agent  is  more  likely  after  the  event  than  before 
it ;  but,  as  in  the  case  of  the  general  inductive  problem  dealt 
with  in  Part  III.,  unless  there  is  an  appreciable  probability  first, 
there  cannot  be  an  appreciable  probability  afterwards.  No 
conclusion,  therefore,  which  is  worth  having,  can  be  based  on  the 
argument  from  design  alone  ;  like  induction,  this  type  of  argu 
ment  can  only  strengthen  the  probability  of  conclusions,  for 
which  there  is  something  to  be  said  on  other  grounds.  We  cannot 
say,  for  example,  that  the  human  eye  is  due  to  design  more 
probably  than  not,  unless  we  have  some  reason,  apart  from  the 
nature  of  its  construction,  for  suspecting  conscious  workmanship. 
But  the  necessary  a  priori  probability,  derived  from  some  other 
source,  may  sometimes  be  forthcoming.  The  man  who  upon  a 
desert  island  picks  up  a  watch,  or  who  sees  the  symbol  John 
Smith  traced  upon  the  sand,  can  use  with  reason  the  argument 
from  design.  For  he  has  other  grounds  for  supposing  that 
beings,  capable  of  designing  such  objects,  do  exist,  and  that 
their  presence  on  the  island,  now  or  formerly,  is  appreciably 
possible. 

5.  The  most  important  problems  at  the  present  day,  in  which 
arguments  of  this  kind  are  employed,  are  those  which  arise  in 
connection  with  psychical  research.1  The  analysis  of  the  '  cross- 

1  The  probability  that  a  remarkable  success  in  naming  playing  cards  is  duo 
to  psychic  agency,  was  discussed  by  Professor  Edgeworth  in  Metretike.  This 
was,  I  think,  the  first  application  of  probabilities  to  these  questions.  See  also 
Proceedings  of  the  Society  for  Psychical  Research,  Parts  VTTT.  and  X.  ;  Professor 
Edgeworth's  article  on  Psychical  Research,  and  Statistical  Method,  Stat.  Joum. 
vol.  Ixxxii.  (1919)  p.  222;  and  Experiments  in  Psychical  Research  at  Leland 
Stanford  Junior  University,  by  J.  Coover. 
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correspondences,'  which  have  played  so  large  a  part  in  recent 
discussions,  presents  many  points  of  difficulty  which  are  not 
dissimilar  to  those  which  arise  in  other  scientific  inquiries  of 
great  complexity  in  which  our  initial  knowledge  is  small.  An 
important  part  of  the  togical  problem,  therefore,  is  to  distinguish 
the  peculiarity  of  psychical  problems  and  to  discover  what  special 
evidence  they  demand  beyond  what  is  required  when  we  deal  with 
other  questions.  There  is  a  certain  tendency,  I  think,  arising  out 
of  the  belief  that  psychical  problems  are  in  some;  way  peculiar, 
to  raise  sceptical  doubts  against  them,  which  are  equally  valid 
against  all  scientific  proofs.  Without  entering  into  any  questions 
of  detail,  let  us  endeavour  to  separate  those  difficulties  which 
seern  peculiar  to  psychical  research  from  those  which,  however 
great,  are  not  different  from  the  difficulties  which  confront 
students  of  heredity,  for  instance,  and  which  are  not  less  likely 
than  these  to  yield  ultimately  to  the  patience  and  the  insight  of 
investigators. 

For  this  purpose  it  is  necessary  to  recur,  briefly,  to  the  analysis 
of  Part  III.  It  v/;is  argued  there  that  the  methods  of  empirical 
proof,  by  which  we  strengthen  the  probability  of  our  conclusions, 
are  not  at  all  dissimilar,  when  we  apply  them  to  the  discovery 
of  formal  truth,  and  when  we  apply  them  to  the  discovery  of  the 
laws  which  relate  material  objects,  and  that  they  may  possibly 
prove  useful  even  in  the  case  of  metaphysics  ;  but  that  the 
initial  probability  which  we  strengthen  by  these  means  is  differ 
ently  obtained  in  each  class  of  problem.  In  logic  it  arises  out 
of  the  postulate  that  apparent  self-evidence  invests  what  seems 
self-evident  with  some  degree  of  probability  ;  and  in  physical 
science,  out  of  the  postulate  that  there  Is  a  limitation  to  the 
amount  of  independent  variety  amongst  the  qualities  of  material 
objects.  But  both  in  logic  and  in  physical  science  we  may  wish 
to  consider  hypotheses  which  it  is  not  possible  to  invest  with  any 
d  priori  probability  and  which  we  entertain  solely  on  account  of 
the  known  truth  of  many  of  their  consequences.  An  axiom 
which  has  no  self-evidence,  but  which  it  seems  necessary  to  com 
bine  with  other  axioms  which  are  self-evident  in  order  to  deduce 
the  generally  accept*  d  body  of  formal  truth,  stands  in  this 
category.  A  scientific  entity,  such  as  the  ether  or  the  electron, 
whose  qualities  have  never  been  observed  but  whose  existence  we 
postulate  for  purposes  of  explanation,  stands  in  it  also.  If  the 
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analysis  of  Part  III.  is  correct,  we  can  never  attribute  a  finite 
probability  *  to  the  truth  of  such  axioms  or  to  the  existence  of 
such  scientific  entities,  however  many  of  their  consequences 
we  find  to  be  true.  They  may  be  convenient  hypotheses,  because, 
if  we  confine  ourselves  to  certain  classes  of  their  consequences, 
we  are  not  likely  to  be  led  into  error  ;  but  they  stand,  neverthe 
less,  in  a  position  altogether  different  from  that  of  such  generalis 
ations  as  we  have  reason  to  invest  with  an  initial  probability. 

Let  us  now  apply  these  distinctions  to  the  problems  of  psychical 
research.  In  the  case  of  some  of  them  we  can  obtain  the  initial 
probability,  I  think,  by  the  same  kind  of  postulates  as  in  physical 
science,  and  our  conclusions  need  not  be  open  to  a  greater  degree 
of  doubt  than  these.  In  the  case  of  others  we  cannot  ;  and  these 
must  remain,  unless  some  method  is  open  to  us  peculiar  to 
psychical  research,  as  tentative  unproved  hypotheses  in  the 
same  category  as  the  ether. 

The  best  example  of  the  first  class  is  afforded  by  telepathy. 
We  know  that  the  consciousnesses  which,  if  our  hypothesis  is 
correct,  act  upon  one  another,  do  exist  ;  and  I  see  no  logical  differ 
ence  between  the  problem  of  establishing  a  law  of  telepathy  and 
that  of  establishing  the  law  of  gravitation.  There  is  at  present  a 
practical  difference  on  account  of  the  much  narrower  scope  of  our 
knowledge,  in  the  case  of  telepathy,  of  cognate  matters.  We  can, 
therefore,  be  much  less  certain  ;  but  there  seems  no  reason  why 
we  should  necessarily  remain  less  certain  after  more  evidence 
has  been  accumulated.  It  is  important  to  remember  that,  in 
the  case  of  telepathy,  we  are  merely  discovering  a  relation  be 
tween  objects  which  we  already  know  to  exist. 

The  best  example  of  the  other  class  is  afforded  by  attempts 
to  attribute  psychic  phenomena  to  the  agency  of  '  spirits  '  other 
than  human  beings.  Such  arguments  are  weakened  at  present 
by  the  fact  that  no  phenomena  are  known,  so  far  as  I  am  aware, 
which  cannot  be  explained,  though  improbably  in  some  cases, 
in  other  ways.  But  even  if  phenomena  were  to  be  observed  of 


1  1  BJX^  assuming  that  there  is  no  argument,  arising  either  from  self  -evidence 
or  analogy,  in  addition  to  the  argument  arising  from  the  truth  of  their  con 
sequences,  in  favour  of  the  truth  of  such  axioms  or  the  existence  of  such  objects  ; 
but  1  daresay  that  this  may  not  certainly  be  the  case.  The  reader  may  be  re 
minded  also  that,  when  1  deny  a  finite  probability  this  is  not  the  same  thing  as 
to  affirm  that  the  probability  is  infinitely  small.  I  mean  simply  that  it  is  not 
greater  than  some  numerically  measurable  probability. 
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which  no  known  agency  could  afford  even  an  improbable  ex 
planation,  the  hypothesis  of  '  spirits  '  would  still  lie  in  the  same 
logical  limbo  as  the  hypothesis  of  the  '  ether,'  in  which  they 
might  be  supposed  not  inappropriately  to  move. 

Such  an  hypothesis  as  the  existence  of  '  spirits  '  could  only 
become  substantial  if  some  peculiar  method  of  knowledge  were 
within  our  power  which  would  yield  MA  the  initial  probability 
which  is  demanded.  That  such  a  method  exists,  it  is  not  in 
frequently  claimed.  If  we  can  directly  perceive  these  '  spirits,' 
as  many  of  those  who  art;  described  in  James's  Varieties  of 
Religious  Experience  think  they  can,  the  problem  is,  logically, 
altogether  changed.  We  have,  in  fact,  very  much  the  same  kind 
of  reason,  though  it  may  be  with  less  probability,  that  we  have 
for  believing  in  the  existence  of  other  people.  The  preceding 
paragraph  applies  only  to  attempts  at  proving  the  existence  of 
'  spirits  '  from  such  evidence  as  is  discussed  by  the  Society  for 
Psychical  Research. 

In  between  these  two  extremes  comes  a  class  of  cases,  with 
regard  to  which  it  is  extremely  difficult  to  come  to  a  decision — 
that  of  attempts  to  attribute  psychic  phenomena  to  the  conscious 
agency  of  the,  dead.  I  wish  to  discuss  here,  not  the  nature  of  the 
existing  evidence,  but  the  question  whether  it  is  possible  for 
any  evidence  to  be  convincing.  In  this  case  the  object  whose 
existence  we  are  endeavouring  to  demonstrate  resembles  in 
many  respects  objects  which  we  know  to  exist.  The  question 
of  epistemology,  which  is  before  us,  is  this  :  Is  it  necessary,  in 
order  that  we  may  have  an  initial  probability,  that  the  object  of 
our  hypothesis  should  resemble  in  every  relevant  particular 
some  one  object  which  we  know  to  exist,  or  is  it  sufficient  that  we 
should  know  instances  of  all  its  supposed  qualities,  though  never 
in  combination  ?  It  is  clear  that  some  qualities  may  be  irrelevant 
— position  in  time  and  space,  for  example — and  that  *  every 
relevant  particular  '  need  not  include  these.  But  can  the  initial 
probability  exist  if  our  hypothesis  assumes  qualities,  which  have 
plainly  some  degree  of  relevance,  in  new  combinations  ?  If  we 
have  no  knowledge  of  consciousness  existing  apart  from  a  living 
body,  can  indirect  evidence  of  whatever  character  afford  us  any 
probability  of  such  a  thing  ?  Could  any  evidence,  for  example, 
persuade  us  that  a  tree  felt  the  emotion  of  amusement,  even  if 
it  laughed  repeatedly  when  we  made  jokes  ?  Yet  the  analogy 
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which  we  demand  seems  to  be  a  matter  of  degree  ;  for  it  does  not 
seem  unreasonable  to  attribute  consciousness  to  dogs,  although 
this  constitutes  a  combination  of  qualities  unlike  in  many  respects 
to  any  which  we  know  to  exist. 

This  discussion,  however,  is  wandering  from  the  subject  of 
probability  to  that  of  epistemology,  and  it  will  not  be  solved  until 
we  possess  a  more  comprehensive  account  of  this  latter  subject 
than  we  have  at  present.  I  wish  only  to  distinguish  between  those 
cases  in  which  we  obtain  the  initial  probability  in  the  same 
manner  as  in  physical  science  from  those  in  which  we  must  get 
it,  if  at  all,  in  some  other  way.  The  distinctions  I  have  made 
are  sufficiently  summarised  by  a  recapitulation  of  the  following 
comparisons  :  We  compared  the  proof  of  telepathy  to  the  proof 
of  gravitation,  the  proof  of  non-human  '  spirits  '  to  the  proof 
of  the  ether,  and,  much  less  closely,  the  proof  of  the  consciousness 
of  the  dead  to  the  proof  of  the  consciousness  of  trees,  or,  perhaps, 
of  dogs. 

Before  passing  to  the  next  of  the  rather  miscellaneous  topics 
of  this  chapter,  it  may  be  worth  while  to  add  that  we  should  be 
very  chary  of  applying  to  problems  of  psychical  research  the 
calculus  of  probabilities.  The  alternatives  seldom  satisfy  the 
conditions  for  the  application  of  the  Principle  of  Indifference, 
and  the  initial  probabilities  are  not  capable  of  being  measured 
numerically.  If,  therefore,  we  endeavour  to  calculate  the  prob 
ability  that  some  phenomenon  is  due  to  '  abnormal '  causes, 
our  mathematics  will  be  apt  to  lead  us  into  unjustifiable 
conclusions. 

6.  Uninstructed  common  sense  seems  to  be  specially  unre 
liable  in  dealing  with  what  are  termed  '  remarkable  occurrences.' 
Unless  a  '  remarkable  occurrence  '  is  simply  one  which  produces 
on  us  a  particular  psychological  effect,  that  of  surprise,  we  can 
only  define  it  as  an  event  which  before  its  occurrence  is  very  im 
probable  on  the  available  evidence.  But  it  will  often  occur — 
whenever,  in  fact,  our  data  leave  open  the  possibility  of  a  large 
number  of  alternatives  and  show  no  preference  for  any  of  them 
—that  every  possibility  is  exceedingly  improbable  a  priori.  It 
follows,  therefore,  that  what  actually  occurs  does  not  derive  any 
peculiar  significance  merely  from  the  fact  of  its  being  'remarkable ' 
in  the  above  sense.  Something  further  is  required  before  we 
can  build  with  success.  Yet  MichelFs  argument  and  the  argu- 
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ment  from  design  derive  a  good  deal  of  their  plausibility,  1  think, 
from  the  *  remarkable  '  character  of  the  actual  constitution 
whether  of  the  heavens  or  of  the  universe,  in  forgetfulness  of  the 
fact  that  it  is  impossible  to  propound  any  constitution  which 
would  if  it  existed  be  other  than  '  remarkable.'  It  is  supposed 
that  a  remarkable  occurrence  is  specially  in  need  of  an  explana 
tion,  and  that  any  sufficient  explanation  has  a  high  probability 
in  its  favour.  That  an  explanation  is  particularly  required, 
possesses  a  measure  of  truth  ;  for  it  is  likely  that  our  original 
data  were  much  lacking  in  completeness,  and  the  occurrence  of 
the  extraordinary  event  brings  to  light  this  deficiency.  Rut 
that  we  are  not  justified  in  adopting  with  confidence  any  sufficient 
explanation,  has  been  shown  already. 

Such  arguments,  however,  get  a  part  of  their  plausibility  from 
a  quite  different  source.  There  is  a  general  supposition  that  some 
kinds  of  occurrences  are  more  likely  than  others  to  be  susceptible 
of  an  explanation  by  us  ;  and,  therefore,  any  explanation  which 
deals  with  such  cases  falls  in  prepared  soil.  Results  which, 
judging  from  ourselves,  conscious  agents  would  be  likely  to  pro 
duce  fall  into  this  category.  Results  which  would  be  probable, 
supposing  a  direct  and  predominant  causal  dependence  between 
the  elements  whose  concomitance  is  remarked,  belong  to  it  also. 
There  is,  in  fact,  a  sort  of  argument  from  analogy  as  to  whether 
certain  sorts  of  phenomena  are  or  are  not  likely  to  be  due  to 
*  chance.'  This  may  explain,  for  example,  why  the  particular 
concurrence  of  atoms  that  go  to  compose  the  human  eye,  why  a 
series  of  correct  guesses  in  naming  playing  cards,  why  special 
symmetry  or  special  asymmetry  amongst  the  stars,  seem  to 
require  explanation  in  no  ordinary  degree.  Prior  to  an  explana 
tion  these  particular  concurrences  or  series  or  distributions  are 
no  more  improbable  than  any  other.  But  the  causes  of  such 
conjunctions  as  these  are  more  likely  to  be  discoverable  by  the 
human  mind  than  are  the  causes  of  others,  and  the  attempt  to 
explain  them  deserves,  therefore,  to  be  more  carefully  considered. 
This  supposition,  derived  by  analogy  or  induction  from  those 
cases  in  which  we  believe  the  causes  to  be  known  to  us,  has,  per 
haps,  some  weight.  But  the  direct  application  of  the  Calculus 
of  Probabilities  can  do  no  more  in  these  cases  than  suggest  matter 
for  investigation.  The  fact  that  a  man  has  made  a  long  series 
of  correct  guesses  in  cases  where  he  is  cut  off  from  the  ordinary 
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channels  of  communication,  is  a  fact  worthy  of  investigation, 
because  it  is  more  likely  to  be  susceptible  of  a  simple  causal  ex 
planation,  which  may  have  many  applications,  than  a  case  in 
which  false  and  true  guesses  follow  one  another  with  no  apparent 
regularity. 

7.  In  the  case  of  empirical  laws,  such  as  Bode's  law,  which  have 
no  more  than  a  very  slight  connection  with  the  general  body  of 
scientific  knowledge,  it  is  sometimes  thought  that  the  law  is  more 
probable  if  it  is  proposed  before  the  examination  of  some  or  all  of 
the  available  instances  than  if  it  is  proposed  after  their  examina 
tion.  Supposing,  for  example,  that  Bode's  law  is  accurately 
true  for  seven  planets,  it  is  held  that  the  law  would  be  more 
probable  if  it  was  suggested  after  the  examination  of  six  and 
was  confirmed  by  the  subsequent  discovery  of  the  seventh,  than 
it  would  be  if  it  had  not  been  propounded  until  after  all  seven 
had  been  observed.  The  arguments  in  favour  of  such  a  conclusion 
are  well  put  by  Peirce  :  *  "  All  the  qualities  of  objects  may  be 
conceived  to  result  from  variations  of  a  number  of  continuous 
variables  ;  hence  any  lot  of  objects  possesses  some  character  in 
common,  not  possessed  by  any  other."  Hence  if  the  common 
character  is  not  predesignate  we  can  conclude  nothing.  Cases 
must  not  be  used  to  prove  a  generalisation  which  has  only  been 
suggested  by  the  cases  themselves.  He  takes  the  first  five  poets 
from  a  biographical  dictionary  with  their  ages  at  death  : 

Aagard    .  .     48         Abunowas 

Abeille     .  .76         Accords  .     45 

Abulola    .  -     84 

"  These  five  ages  have  the  following  characters  in  common  : 
"  1.  The  difference  of  the  two  digits  composing  the  number, 

divided  by  three,  leaves  a  remainder  of  one. 
"2.  The  first  digit  raised  to  the  power  indicated  by  the  second, 

and  then  divided  by  three,  leaves  a  remainder  of  one. 
"  3.  The  sum  of  the  prime  factors  of  each  age,  including  one  as 

a  prime  factor,  is  divisible  by  three." 
He  compares  a  generalisation  regarding  the  ages  of  poets  based 
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1  C.  S.  Peirce,  A  Theory  of  Probable  Inference,  pp.  1G2-1G7  ;    published  i. 
ins  Hopkins  Studies  in  Logic,  1S83. 
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on  this  evidence  to  Dr.  Lyon  Playfair's  argument  about  the 
specific  gravities  of  the  three  allotropic  forms  of  carbon  : 

Diamond          .          .          .       3-18=^/12 
Graphite  .          .          .       2-29     i/12 

Charcoal  .          .          .        1  -88  -  i/ 1 2 

approximately,  the  atomic  weight  of  carbon  being  12.  Dr. 
Playfair  thinks  that  the  above  renders  it  probable  that  the  specific 
gravities  of  the  allotropic  forms  of  other  elements  would,  if  we 
knew  them,  be  found  to  equal  the  different  roots  of  their  atomic 
weight. 

The  weakness  of  these  arguments,  however,  has  a  different 
explanation.  These  inductions  are  very  improbable,  because  they 
are  out  of  relation  to  the  rest  of  our  knowledge  and  are  based  on 
a  very  small  number  of  instances.  The  apparent  absurdity, 
moreover,  of  the  inductive  law  of  Poets'  Ages  is  increased  by  the 
fact  that  we  take  account  of  the  knowledge  we  actually  possess 
that  the  ages  of  poets  are  not  in  fact  connected  by  any  such  law. 
If  we  knew  nothing  whatever  about  poets'  ages  except  what  is 
stated  above,  the  induction  would  be  as  valid  as  any  other  which 
is  based  on  a  very  weak  analogy  and  a  very  small  number  of 
instances  and  is  unsupported  by  indirect  evidence. 

The  peculiar  virtue  of  prediction  or  predesignation  Ls  altogether 
imaginary.  The  number  of  instances  examined  and  the  analogy 
between  them  are  the  essential  points,  and  the  question  as  to 
whether  a  particular  hypothesis  happens  to  be  propounded  before 
or  after  their  examination  is  quite  irrelevant.  Jf  all  our  in 
ductions  had  to  be  thought  of  before  we  examined  the  cases  to 
which  we  apply  them,  we  should,  doubtless,  make  fewer  induc 
tions  ;  but  there  is  no  reason  to  think  that  the  few  we  should  make 
would  be  any  better  than  the  many  from  which  we  should  be 
precluded.  Tin;  plausibility  of  the  argument  is  derived  from  a 
different  source.  If  an  hypothesis  is  proposed  a  priori,  this 
commonly  means  that  there  is  some  ground  for  it,  arising  out  of 
our  previous  knowledge,  apart  from  the  purely  inductive  ground, 
and  if  such  is  the  case  the  hypothesis  is  clearly  stronger  than  one 
which  reposes  on  inductive  grounds  only.  But  if  it  LS  a  mere 
guess,  the  lucky  fact  of  its  preceding  some  or  all  of  the  cases  which 
verify  it  adds  nothing  whatever  to  its  value.  It  is  the  union  of 
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prior  knowledge,  with  the  inductive  grounds  which  arise  out  of 
the  immediate  instances,  that  lends  weight  to  an  hypothesis,  and 
not  the  occasion  on  which  the  hypothesis  is  first  proposed.  It  is 
sometimes  said,  to  give  another  example,  that  the  daily  fulfilment 
of  the  predictions  of  the  Nautical  Almanack  constitutes  the  most 
cogent  proof  of  the  laws  of  dynamics.  But  here  the  essence  of 
the  verification  lies  in  the  variety  of  cases  which  can  be  brought 
accurately  under  our  notice  by  means  of  the  Almanack,  and  in 
the  fact  that  they  have  all  been  obtained  on  a  uniform  principle, 
not  in  the  fact  that  the  verification  is  preceded  by  a  prediction. 

The  same  point  arises  not  uncommonly  in  statistical  inquiries. 
If  a  theory  is  first  proposed  and  is  then  confirmed  by  the  examina 
tion  of  statistics,  we  are  inclined  to  attach  more  weight  to  it  than 
to  a  theory  which  is  constructed  in  order  to  suit  the  statistics. 
But  the  fact  that  the  theory  which  precedes  the  statistics  is  more 
likely  than  the  other  to  be  supported  by  general  considerations 
—for  it  has  not,  presumably,  been  adopted  for  no  reason  at  all- 
constitutes  the  only  valid  ground  for  this  preference.  If  it  does 
not  receive  more  support  than  the  other  from  general  considera 
tions,  then  the  circumstances  of  its  origin  are  no  argument  in  its 
favour.  The  opposite  view,  which  the  unreliability  of  some 
statisticians  has  brought  into  existence,— that  it  is  a  positive 
advantage  to  approach  statistical  evidence  without  preconcep 
tions  based  on  general  grounds,  because  the  temptation  to  '  cook  ' 
the  evidence  will  prove  otherwise  to  be  irresistible,— has  no 
logical  basis  and  need  only  be  considered  when  the  impartiality  of 
an  investigator  is  in  doubt. 


CHAPTER   XXVI 

THE    APPLICATION    OF    PROBABILITY   TO   CONDUCT 

1.  GIVEN  as  our  basis  what  knowledge  we  actually  have,  the 
probable,  I  have  said,  is  that  which  it  is  rational  for  us  to  believe. 
This  is  not  a  definition.  For  it  is  not  rational  for  us  to  believe 
that  the  probable  is  true  ;  it  is  only  rational  to  have  a  probable 
belief  in  it  or  to  believe  it  in  preference  to  alternative  beliefs.  To 
believe  one  thing  in  preference  to  another,  as  distinct  from  believing 
the  first  true  or  more  probable  and  the  second  false  or  less  probable, 
must  have  reference  to  action  and  must  be  a  loose  way  of  ex 
pressing  the;  propriety  of  acting  on  one  hypothesis  rather  than 
on  another.  We  might  put  it,  therefore,  that  the  probable  is 
the  hypothesis  on  which  it  is  rat  ional  for  us  to  act.  It  is,  however, 
not  so  simple  as  this,  for  the  obvious  reason  that  of  two  hypotheses 
it  may  be  rational  to  act  on  the  less  probable  if  it  leads  to  the 
greater  good.  We  cannot  say  more  at  present  than  that  the 
probability  of  a  hypothesis  is  one  of  the  things  to  be  determined 
and  taken  account  of  before  acting  on  it. 

2.  I  do  not  know  of  passages  in  the  ancient  philosophers  which 
explicitly  point  out  the  dependence  of  the  duty  of  pursuing 
goods  on  the  reasonable  or  probable  expectation  of  attaining 
them  relative  to  the  agent's  knowledge.  This  means  only  that 
analysis  had  not  disentangled  the  various  elements  in  rational 
action,  not  that  common  sense  neglected  them.  Herodotus 
puts  the  point  quite  plainly.  "  There  is  nothing  more  profitable 
for  a  man,"  he  says,  "  than  to  take  good  counsel  with  himself  ; 
for  even  if  the  event  turns  out  contrary  to  one's  hope,  still  one's 
decision  was  right,  even  though  fortune  lias  made  it  of  no  effect  : 
whereas  if  a  man  acts  contrary  to  good  counsel,  although  by  luck 
he  gets  what  he  had  no  right  to  expect,  his  decision  was  not  any 
the  less  foolish."  l 

1  Herod,  vii.  10. 
:J07 
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3.  The  first  contact  of  theories  of  probability  with  modern 
ethics  appears  in  the  Jesuit  doctrine  of  probabilism.  According 
to  this  doctrine  one  is  justified  in  doing  an  action  for  which  there 
is  any  probability,  however  small,  of  its  results  being  the  best 
possible.  Thus,  if  any  priest  is  willing  to  permit  an  action,  that 
fact  affords  some  probability  in  its  favour,  and  one  will  not  be 
damned  for  performing  it,  however  many  other  priests  denounce 
it.1  It  may  be  suspected,  however,  that  the  object  of  this 
doctrine  was  not  so  much  duty  as  safety.  The  priest  who  per 
mitted  you  so  to  act  assumed  thereby  the  responsibility.  The 
correct  application  of  probability  to  conduct  naturally  escaped 
the  authors  of  a  juridical  ethics,  which  was  more  interested  in 
the  fixing  of  responsibility  for  definite  acts,  and  in  the  various 
specified  means  by  which  responsibility  might  be  disposed  of, 
than  in  the  greatest  possible  sum-total  of  resultant  good. 

A  more  correct  doctrine  was  brought  to  light  by  the  efforts  of 
the  philosophers  of  the  Port  Royal  to  expose  the  fallacies  of  prob 
abilism.  "  In  order  to  judge,"  they  say,  "  of  what  we  ought  to 
do  in  order  to  obtain  a  good  and  to  avoid  an  evil,  it  is  necessary 
to  consider  not  only  the  good  and  evil  in  themselves,  but  also 
the  probability  of  their  happening  and  not  happening,  and  to 
regard  geometrically  the  proportion  which  all  these  things  have, 
taken  together."  2  Locke  perceived  the  same  point,  although 
not  so  clearly.3  By  Leibniz  this  theory  is  advanced  more 
explicitly  ;  in  such  judgments,  he  says,  "  as  in  other  estimates 
disparate  and  heterogeneous  and,  so  to  speak,  of  more  than  one 
dimension,  the  greatness  of  that  which  is  discussed  is  in  reason 
composed  of  both  estimates  (i.e.  of  goodness  and  of  probability), 
and  is  like  a  rectangle,  in  which  there  are  two  considerations, 
viz.  that  of  length  and  that  of  breadth.  .  .  .  Thus  we  should 

1  Compare  with  this  doctrine  the  following  curious  passage  from  Jeremy 
Taylor  :• — "  We  being  the  persons  that  are  to  be  persuaded,  we  must  see  that 
we   be  persuaded  reasonably.     And   it  is  unreasonable  to  assent  to  a  lesser 
evidence  when  a  greater  and  clearer  is  propounded :  but  of  that  every  man  for 
himself  is  to  take  cognisance,  if  he  be  able  to  judge ;  if  he  be  not,  he  is  not 
bound    under  the   tie   of   necessity   to   know   anything   of  it.      That   that    is 
necessary  shall  be  certainly  conveyed  to  him  :  God,  that  best  can,  will  certainly 
take  care  for  that ;  for  if  he  does  not,  it  becomes  to  be  not  necessary  ;  or  if  it 
should  still  remain  necessary,  and  he  be  damned  for  not  knowing  it,  and  yet  to 
know  it  be  not  in  his  power,  then  who  can  help  it !     There  can  be  no  further 
care  in  this  business." 

2  The  Port  Royal  Logic  (1662),  Bug.  Trans,  p.  367. 

3  Essay  concerning  Human  Understanding,  book  ii.  chap.  xxi.  §  66. 
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still  need  the  art  of  thinking  and  that  of  estimating  probabilities, 
besides  tin*,  knowledge  of  the  value  of  goods  and  evils,  in  order 
properly  to  employ  the  art  of  consequences."  1 

In  his  preface  to  the  Analog;/  Butler  insists  on  "  the  absolute 
and  formal  obligation  "  under  which  even  a  low  probability, 
if  it  is  the  greatest,  may  lay  us  :  "  To  us  probability  is  the  very 
guide  of  life." 

4.  With  the  development  of  a  utilitarian  ethics  largely  con 
cerned  with  the  summing  up  of  consequences,  the  place  of  prob 
ability  in  ethical  theory  has  become  much  more  explicit.  Hut 
although  the  general  outlines  of  tin;  problem  are  now  clear,  there 
are  some  elements  of  confusion  not  yet  dispersed.  I  will  deal  wit  h 
some  of  them. 

In  his  Principia  Ethica  (p.  152)  Dr.  Moore  argues  that  "  the 
first  difficulty  in  the  way  of  establishing  a  probability  that  one 
course  of  action  will  give  a  better  total  result  than  another,  lies 
in  the  fact  that  we  have  to  take  account  of  the  effects  of  both 
throughout  an  infinite  future.  .  .  .  We  can  certainly  only  pretend 
to  calculate  the  effects  of  actions  within  what  may  be  called  an 
'  immediate  future.'  .  .  .  We  must,  therefore,  certainly  have 
some  reason  to  believe,  that  no  consequences  of  our  action  in  a 
farther  future  will  generally  be  such  as  to  reverse  the  balance  of 
good  that  is  probable  in  the  future  which  we  can  foresee.  This 
large  postulate  mast  be  made,  if  we  are  ever  to  assert  that  the 
results  of  one  action  will  be  even  probably  better  than  those  of 
another.  Our  utter  ignorance  of  the  far  future  gives  us  no  justi 
fication  for  saying  that  it  is  even  probably  right  to  choose  the 
greater  good  within  the  region  over  which  a  probable  forecast 
may  extend." 

This  argument  seems  to  me  to  be  invalid  and  to  depend  on 
a  wrong  philosophical  interpretation  of  probability.  Mr.  Moore's 
reasoning  endeavours  to  show  that  there  is  not  even  a  probability 
by  showing  that  there  is  not  a  certainty.  We  must  not,  of  course, 
have  reason  to  believe  that  remote  consequences  will  generally 
be  such  as  to  reverse  the  balance  of  immediate  good,  lint  we 
need  not  be  certain  that  the  opposite  is  the  case.  If  good  is 
additive,  if  we  have  reason  to  think  that  of  two  actions  one  pro 
duces  more  good  than  the  other  in  the  near  future,  and  if  we  have 
no  means  of  discriminating  between  their  results  in  the  distant 

1  Xouvettux  Kami*,  book  ii.  rhuj>.  xxi. 
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future,  then  by  what  seems  a  legitimate  application  of  the 
Principle  of  Indifference  we  may  suppose  that  there  is  a  prob 
ability  in  favour  of  the  former  action.  Mr.  Moore's  argument 
must  be  derived  from  the  empirical  or  frequency  theory  of 
probability,  according  to  which  we  must  know  for  certain  what 
will  happen  generally  (whatever  that  may  mean)  before  we  can 
assert  a  probability. 

The  results  of  our  endeavours  are  very  uncertain,  but  we  have 
a  genuine  probability,  even  when  the  evidence  upon  which  it  is 
founded  is  slight.  The  matter  is  truly  stated  by  Bishop  Butler  : 
"  From  our  short  views  it  is  greatly  uncertain  whether  this 
endeavour  will,  in  particular  instances,  produce  an  overbalance 
of  happiness  upon  the  whole  ;  since  so  many  and  distant  things 
must  come  into  the  account.  And  that  which  makes  it  our  duty 
is  that  there  is  some  appearance  that  it  will,  and  no  positive 
appearance  to  balance  this,  on  the  contrary  side.  ..." 

The  difficulties  which  exist  are  not  chiefly  due,  I  think,  to  our 
ignorance  of  the  remote  future.  The  possibility  of  our  knowing 
that  one  thing  rather  than  another  is  our  duty  depends  upon  the 
assumption  that  a  greater  goodness  in  any  part  makes,  in  the 
absence  of  evidence  to  the  contrary,  a  greater  goodness  in  the 
whole  more  probable  than  would  the  lesser  goodness  of  the  part. 
We  assume  that  the  goodness  of  a  part  is  favourably  relevant  to 
the  goodness  of  the  whole.  Without  this  assumption  we  have  no 
reason,  not  even  a  probable  one,  for  preferring  one  action  to  any 
other  on  the  whole.  If  we  suppose  that  goodness  is  always 
organic,  whether  the  whole  is  composed  of  simultaneous  or 
successive  parts,  such  an  assumption  is  not  easily  justified.  The 
case  is  parallel  to  the  question,  whether  physical  law  is  organic  or 
atomic,  discussed  in  Chapter  XXI.  §  6. 

Nevertheless  we  can  admit  that  goodness  is  partly  organic 
and  still  allow  ourselves  to  draw  probable  conclusions.  For  the 
alternatives,  that  either  the  goodness  of  the  whole  universe 
throughout  time  is  organic  or  the  goodness  of  the  universe  is  the 
arithmetic  sum  of  the  goodnesses  of  infinitely  numerous  and 
infinitely  divided  parts,  are  not  exhaustive.  We  may  suppose 
that  the  goodness  of  conscious  persons  is  organic  for  each  distinct 

1  This  passage  is  from  the  Analogy.  The  Bishop  adds  :  "  ...  and  also 
that  such  benevolent  endeavour  is  a  cultivation  of  that  most  excellent  of  all 
virtuous  principles,  the  active  principle  of  benevolence." 
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and  individual  personality.  Or  we  may  suppose  that,  when 
conscious  units  are  in  conscious  relationship,  then  the  whole 
which  we  must  treat  as  organic  includes  both  units.  These  are 
only  examples.  We  must  suppose,  in  general,  that  the  units 
whose  goodness  wre  must  regard  as  organic  and  indivisible  are 
not  always  larger  than  those  the  goodness  of  which  we  can 
perceive  and  judge  directly. 

5.  The  difficulties,  however,  which  are  most  fundamental 
from  the  standpoint  of  the  student  of  probability,  are  of  a  different 
kind.  Normal  ethical  theory  at  the  present  day,  if  there  can  be 
said  to  be  any. such,  makes  two  assumptions:  first,  that  degrees 
of  goodness  are  numerically  measurable  and  arithmetically 
additive,  and  second,  that  degrees  of  probability  also  are  numeric 
ally  measurable.  This  theory  goes  on  to  maintain  that  what 
we  ought  to  add  together,  when,  in  order  to  decide  between  two 
courses  of  action,  we  sum  up  the  results  of  each,  are  the  '  mathe 
matical  expectations  '  of  the  several  results.  '  Mathematical 
expectation  '  is  a  technical  expression  originally  derived  from  the 
scientific  study  of  gambling  and  games  of  chance,  and  stands  for 
the  product  of  the  possible  gain  with  the  probability  of  attaining 
it.1  In  order  to  obtain,  therefore,  a  measure  of  what  ought  to 
be  our  preference  in  regard  to  various  alternative  courses  of  action, 
we  must  sum  for  each  course  of  action  a  series  of  terms  made 
up  of  the  amounts  of  good  which  may  attach  to  each  of  its 
possible  consequences,  each  multiplied  by  its  appropriate  prob 
ability. 

The  first  assumption,  that  quantities  of  goodness  are  duly 
subject  to  the  laws  of  arithmetic,  appears  to  me  to  be  open  to  a 
certain  amount  of  doubt.  Hut  it  would  take  me  too  far  from 
my  proper  subject  to  discuss  it  here,  and  1  shall  allow,  for  the 
purposes  of  further  argument,  that  in  some  sense  and  to  some 
extent  this  assumption  can  be  justified.  The  second  assumption, 
however,  that  degrees  of  probability  are  wholly  subject  to  the 
laws  of  arithmetic,  runs  directly  counter  to  the  view  which  has 

1  Priority  in  the  conception  of  mathematical  ex|>e<-tation  can,  I  think,  ho 
claimed  by  ix-ibm/.,  l>e  inrrrti  avAiinatione,  1078  (('outurat,  Isxjique.  de  Isfibniz, 
p.  248).  In  a  letter  to  PliicriuB.  1087  (Dutcn.s,  vi.  i.  30  and  Couturat,  op.  cil. 
p.  240)  Leibni/.  propowd  tin  application  of  the  same  principle  to  juris 
prudence,  by  virtue  of  which,  if  two  litigants  lay  claim  to  a  sum  of  money, 
and  if  the  claim  "f  the  one  i«  twice  as  probable  OH  that  of  the  other,  the  sum 
Hhould  bo  divided  between  them  in  that  proportion.  The  doctrine,  wx-mn 
sensible,  but  I  am  not  aware  that  it  ha*  ever  boon  acUrd  on. 
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been  advocated  in  Part  I.  of  this  treatise.  Lastly,  if  both  these 
points  be  waived,  the  doctrine  that  the  '  mathematical  expecta 
tions  '  of  alternative  courses  of  action  are  the  proper  measures  of 
our  degrees  of  preference  is  open  to  doubt  on  two  grounds — first, 
because  it  ignores  what  I  have  termed  in  Part  I.  the  '  weights ' 
of  the  arguments,  namely,  the  amount  of  evidence  upon  which 
each  probability  is  founded  ;  and  second,  because  it  ignores  the 
element  of  '  risk  '  and  assumes  that  an  even  chance  of  heaven 
or  hell  is  precisely  as  much  to  be  desired  as  the  certain  attain 
ment  of  a  state  of  mediocrity.  Putting  on  one  side  the  first  of 
these  grounds  of  doubt,  I  will  treat  each  of  the  others  in  turn. 

6.  In  Chapter  III.  of  Part  I.  I  have  argued  that  only  in  a 
strictly  limited  class  of  cases  are  degrees  of  probability  numeric 
ally  measurable.     It  follows  from  this  that  the  '  mathematical 
expectations  '  of  goods  or  advantages  are  not  always  numerically 
measurable  ;   and  hence,  that  even  if  a  meaning  can  be  given  to 
the  sum  of  a  series  of  non-numerical '  mathematical  expectations,' 
not  every  pair  of  such  sums  are  numerically  comparable  in  respect 
of  more  and  less.     Thus  even  if  we  know  the  degree  of  advantage 
which  might  be  obtained  from  each  of  a  series  of  alternative 
courses  of  actions  andiknow  also  the  probability  in  each  case  of 
obtaining  the  advantage  in  question,  it  is  not  always  possible  by 
a  mere  process  of  arithmetic  to  determine  which  of  the  alternatives 
ought  to  be  chosen.     If,  therefore,  the  question  of  right  action  is 
under  all  circumstances  a  determinate  problem,  it  must  be  in 
virtue  of  an  intuitive  judgment  directed  to  the  situation  as  a 
whole,  and  not  in  virtue  of  an  arithmetical  deduction  derived 
from  a  series  of  separate  judgments  directed  to  the  individual 
alternatives  each  treated  in  isolation. 

We  must  accept  the  conclusion  that,  if  one  good  is  greater 
than  another,  but  the  probability  of  attaining  the  first  less  than 
that  of  attaining  the  second,  the  question  of  which  it  is  our  duty 
to  pursue  may  be  indeterminate,  unless  we  suppose  it  to  be 
within  our  power  to  make  direct  quantitative  judgments  of  prob 
ability  and  goodness  jointly.  It  may  be  remarked,  further, 
that  the  difficulty  exists,  whether  the  numerical  indeterminate- 
ness  of  the  probability  is  intrinsic  or  whether  its  numerical  value 
is,  as  it  is  according  to  the  Frequency  Theory  and  most  other 
theories,  simply  unknown. 

7.  The  second  difficulty,  to  which  attention  is  called  above, 
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is  the  neglect  of  the  '  weights  '  of  arguments  in  the  conception 
of  '  mathematical  expectation.'  In  Chapter  VI.  of  Part  I.  the 
significance  of  '  weight '  has  been  discussed.  In  the  present 
connection  the  question  comes  to  this — if  two  probabilities  an» 
equal  in  degree,  ought  we,  in  choosing  our  course  of  action,  to 
prefer  that  one  which  is  based  on  a  greater  body  of  knowledge  ? 

The  question  appears  to  me  to  be  highly  perplexing,  and  it  is 
difficult  to  say  much  that  is  useful  about  it.  But  the  degree  of 
completeness  of  the  information  upon  which  a  probability  is 
based  does  seem  to  be  relevant,  as  well  as  the  actual  magnitude 
of  the  probability,  in  making  practical  decisions.  Bernoulli's 
maxim,1  that  in  reckoning  a  probability  we  must  take  into  account 
all  the  information  which  we  have,  even  when  reinforced  by 
Locke's  maxim  that  we  must  get  all  the  information  we  can,2 
does  not  seem  completely  to  meet  tin1,  case.  If,  for  one  alternative, 
the  available  information  is  necessarily  small,  that  does  not  seem 
to  be  a  consideration  which  ought  to  be  left  out  of  account 
altogether. 

8.  The  last  dilficulty  concerns  the  question  whether,  the 
former  difficulties  being  waived,  the  '  mathematical  expectation  ' 
of  different  courses  of  action  accurately  measures  what  our 
preferences  ought  to  be — whether,  that  is  to  say,  the  undesir- 
ability  of  a  given  course  of  action  increases  in  direct  proportion 
to  any  increase  in  the  uncertainty  of  its  attaining  its  object,  or 
whether  some  allowance  ought  to  be  made  for  '  risk/  its  undesir- 
ability  increasing  more  than  in  proportion  to  its  uncertainty. 

In  fact  the  meaning  of  the  judgment,  that  we  ought  to  act  in 
such  a  way  as  to  produce  most  probably  the  greatest  sum  of 
goodness,  is  not  perfectly  plain.  Does  this  mean  that  we 
ought  so  to  act  as  to  make  the,  sum  of  the  goodnesses  of  each  of 
the  possible  consequences  of  our  action  multiplied  by  its  prob 
ability  a  maximum  ?  Those  who  rely  on  the  conception  of 
'  mathematical  expectation  '  must  hold  that  this  is  an  indisput 
able  proposition.  The  justifications  for  this  view  most  commonly 
advanced  resemble  that  given  by  Condorcet  in  his  "  Reflexions 

1  Ar*  Ctjnjcrtaruii,  p.  '215  :  "  Nun  Buflicit  rxpemlcro  unuin  altc-rumvo  ar^u- 
inentum,  m-d  conquirenda  Hunt  omnia,  quac  in  co^nitionoin  nohtram  vrniro 
posaunt,  atquo  ullo  inodo  ud  probationer!!  roi  fan  TO  videntur." 

1  E.ixay  ctjnr.r  ruing  Huni'in  Understanding,  book  ii.  ••Imp.  xxi.  §  <17  : 
that  jud^uH  without  informing  hiinwslf  to  tin;  utmost  that  h*1  is  cap.iMr,  < -annot 
acquit  hiniflclf  of  jitd'jin'j  amiV." 
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sur  la  regie  generale,  qui  prescrit  de  prendre  pour  valeur  d'un 
evenement  incertain,  la  probabilite  de  cet  evenement  multiplied 
par  la  valeur  de  1'evenement  en  lui-meme,"  1  where  he  argues 
from  Bernoulli's  theorem  that  such  a  rule  will  lead  to  satisfactory 
results  if  a  very  large  number  of  trials  be  made.  As,  however, 
it  will  be  shown  in  Chapter  XXIX.  of  Part  V.  that  Bernoulli's 
theorem  is  not  applicable  in  by  any  means  every  case,  this 
argument  is  inadequate  as  a  general  justification. 

In  the  history  of  the  subject,  nevertheless,  the  theory  of 
'  mathematical  expectation  '  has  been  very  seldom  disputed. 
As  D'Alembert  has  been  almost  alone  in  casting  serious  doubts 
upon  it  (though  he  only  brought  himself  into  disrepute  by  doing 
so),  it  will  be  worth  while  to  quote  the  main  passage  in  which  he 
declares  his  scepticism  :  "  II  me  sembloit  "  (in  reading  Bernoulli's 
Ars  Conjectandi)  "  que  cette  matiere  avoit  besoin  d'etre  traitee 
d'une  maniere  plus  claire  ;  je  voyois  bien  que  1'esperance  etoit 
plus  grande,  1°  que  la  somme  esperee  etoit  plus  grande,  2°  que 
la  probabilite  de  gagner  F  etoit  aussi.  Mais  je  ne  voyois  pas  avec 
la  me  me  evidence,  et  je  ne  le  vois  pas  encore,  1°  que  la  probabilite 
soit  estimee  exactement  par  les  methodes  usitees  ;  2°  que  quand 
elle  le  seroit,  1'esperance  doive  etre  proportionnelle  a  cette  proba 
bilite  simple,  plutot  qu'a  une  puissance  ou  meme  a  une  fonction 
de  cette  probabilite  ;  3"  que  quand  il  y  a  plusieurs  combinaisons 
qui  donnent  difierens  avantages  ou  difTerens  risques  (qu'on 
regarde  comme  des  avantages  negatifs)  il  faille  se  contenter 
d'ajouter  simplement  ensemble  toutes  les  esperances  pour  avoir 
1'esperance  totale."  2 

In  extreme  cases  it  seems  difficult  to  deny  some  force  to 
D'Alembert's  objection  ;  and  it  was  with  reference  to  extreme 
cases  that  he  himself  raised  it.  Is  it  certain  that  a  larger  good, 
which  is  extremely  improbable,  is  precisely  equivalent  ethically 
to  a  smaller  good  which  is  proportionately  more  probable  ?  We 
may  doubt  whether  the  moral  value  of  speculative  and  cautious 
action  respectively  can  be  weighed  against  one  another  in  a 
simple  arithmetical  way,  just  as  we  have  already  doubted  whether 
a  good  whose  probability  can  only  be  determined  on  a  slight 
basis  of  evidence  can  be  compared  by  means  merely  of  the 

1  Hist,  de  VAcad.,  Paris,  1781. 

2  Opuscules  matfumatiques,  vol.  iv.,  1768  (extraits  do  lettrcs),  pp.  284,  285. 
See  also  p.  88  of  the  same  volume. 


CH.  xxvi  PHILOSOPHICAL  APPLICATIONS  315 

magnitude  of  this  probability  with  another  good  whose  likelihood 
is  based  on  completer  knowledge. 

There  seems,  at  any  rate,  a  good  deal  to  be  said  for  the  con 
clusion  that,  other  things  being  equal,  that  course  of  action  is 
preferable  which  involves  least  risk,  and  about  the  results  of 
which  we  have  the  most  complete  knowledge.  In  marginal  cases, 
therefore,  the  coefficients  of  weight  and  risk  as  well  as  that 
of  probability  are  relevant  to  our  conclusion.  It  seems  natural 
to  suppose  that  they  should  exert  some  influence  in  other  cases 
also,  the  only  difficulty  in  this  being  the  lack  of  any  principle  for 
the  calculation  of  the  degree  of  their  influence.  A  high  weight 
and  the  absence  of  risk  increase  pro  tanto  the  desirability  of  the 
action  to  which  they  refer,  but  we  cannot  measure  the  amount 
of  the  increase. 

The  '  risk  '  may  be  defined  in  some  such  way  as  follows.  If 
A  is  the  amount  of  good  which  may  result,  p  its  probability 
(p+q  =  l),  and  E  the  value  of  the  'mathematical  expectation,' 
so  that  E=7?A,  then  the  'risk'  is  11,  where  ll=p(A-Yi)^= 
p(l  -p)\=pqk  =  qE.  This  may  be  put  in  another  way:  E 
measures  the  net  immediate  sacrifice  which  should  be  made  in  the; 
hope  of  obtaining  A  ;  q  is  the  probability  that  this  sacrifice  will 
be  made  in  vain  ;  so  that  </E  is  the  '  risk.'  J  The  ordinary  theory 
supposes  that  the  ethical  value  of  an  expectation  is  a  function 
of  E  only  and  Is  entirely  independent  of  H. 

We  could,  it  we  liked,  define  a  conventional  coefficient  c  of 

2pw 
weight  and  risk,  such  as  c  --  ]([+lvy  where  w  m'§asuro8  the 

'weight,'  which  is  equal  to  unity  when  p  =  l  and  ic  =1,  and 
to  zero  when  y)  -  0  or  w-0,  and  has  an  intermediate  value 
in  other  cases.2  But  if  doubts  as  to  the  sufficiency  of  the 
conception  of  '  mathematical  expectation  '  be  sustained,  it  is  not 
likely  that  the  solution  will  lie,  as  D'Alembert  suggests,  and  as 
has  been  exemplified  above,  in  the  discovery  of  some  m»»re 

1  The  ih.-ory  of  lti*iko\*  briefly  dealt  with  by  (V.tiber,  Wahrtchtinlifhlcfits- 
rrrlimiiKi,  vol.  ii.  pp.  Ul<) ''/  /»«/.  If  K  measures  the  first  insuranc-e,  this  leads  to  a 
Kixikn  of' the  .second  order,  K,  r/K  V-K.  This  u-ain  may  U-  insured  acainst, 
and  bv  a  sulli* lent  number  of  sueh  reinsurances  the  risk  can  be  completely 

K        K 
shifted  :    K  -   K,  ;   K,  K(l   \-q  +  q-  ;      .  .)      ,      ?      p      A- 

*  If  ;>A  //A,  if  w',  and  7  7',  then  rA  -r'A';  if  /;A  ;/A',  if  w ,  and 
7-7,  then  rA  •  r'A' ;  if  /jA  />'A'.  IT  -w,  and  7  --  tft  then  rA  -c'\'\  but  if 
pA.  =  p'\',  w  w',  and  7  -7',  we  cannot  in  general  compare  rA  and  r  A  . 
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complicated  function  of  the  probability  wherewith  to  compound 
the  proposed  good.  The  judgment  of  goodness  and  the  judgment 
of  probability  both  involve  somewhere  an  element  of  direct 
apprehension,  and  both  are  quantitative.  We  have  raised  a 
doubt  as  to  whether  the  magnitude  of  the  *  oughtness  '  of  an 
action  can  be  in  all  cases  directly  determined  by  simply  multi 
plying  together  the  magnitudes  obtained  in  the  two  direct  judg 
ments  ;  and  a  new  direct  judgment  may  be  required,  respecting 
the  magnitude  of  the  '  oughtness  '  of  an  action  under  given 
circumstances,  which  need  not  bear  any  simple  and  necessary 
relation  to  the  two  former. 

The  hope,  which  sustained  many  investigators  in  the  course 
of  the  nineteenth  century,  of  gradually  bringing  the  moral  sciences 
under  the  sway  of  mathematical  reasoning,  steadily  recedes — if 
we  mean,  as  they  meant,  by  mathematics  the  introduction  of 
precise  numerical  methods.  The  old  assumptions,  that  all 
quantity  is  numerical  and  that  all  quantitative  characteristics 
are  additive,  can  be  no  longer  sustained.  Mathematical  reasoning 
now  appears  as  an  aid  in  its  symbolic  rather  than  in  its  numerical 
character.  I,  at  any  rate,  have  not  the  same  lively  hope  as 
Condorcet,  or  even  as  Edge  worth,  "  eclairer  les  Sciences  morales 
et  politiques  par  le  flambeau  de  PAlgebre."  In  the  present  case, 
even  if  we  are  able  to  range  goods  in  order  of  magnitude,  and  also 
their  probabilities  in  order  of  magnitude,  yet  it  does  not  follow 
that  we  can  range  the  products  composed  of  each  good  and  its 
corresponding  probability  in  this  order. 

9.  Discussions  of  the  doctrine  of  Mathematical  Expectation, 
apart  from  its  directly  ethical  bearing,  have  chiefly  centred 
round  the  classic  Petersburg  Paradox,1  which  has  been  treated  by 
almost  all  the  more  notable  writers,  and  has  been  explained  by 
them  in  a  great  variety  of  ways.  The  Petersburg  Paradox  arises 
out  of  a  game  in  which  Peter  engages  to  pay  Paul  one  shilling 
if  a  head  appears  at  the  first  toss  of  a  coin,  two  shillings  if  it  does 
not  appear  until  the  second,  and,  in  general,  2r~1  shillings  if  no 
head  appears  until  the  rth  toss.  What  is  the  value  of  Paul's 
expectation,  and  what  sum  must  he  hand  over  to  Peter  before 
the  game  commences,  if  the  conditions  are  to  be  fair  ? 

1  For  the  history  of  this  paradox  see  Todhunter.  The  name  is  due,  he  says, 
to  its  having  first  appeared  in  a  memoir  by  Daniel  Bernoulli  in  the  Commentarii 
of  the  Petersburg  Academy. 
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The  mathematical  answer  is  -(J)''2r   ',  if  the  number  of  tosses 

i 

is  not  in  any  case  to  exceed  n  in  all,  and  ~(  o)'2'  "  '  if  this  restriction 

i 

is  removed.     That  is  to  say,  Paul  should  pay  ^  shillings  in  the 

first  case,  and  an  infinite  sum  in  the  second.  Nothing,  it  is  said, 
could  be  more  paradoxical,  and  no  sane  Paul  would  engage  on 
these  terms  even  with  an  honest  Peter. 

Many  of  the  solutions  which  have  been  offered  will  occur  at 
once  to  the  reader.  The  conditions  of  the  game  imply  contra 
diction,  say  Poisson  and  Condorcet  ;  Peter  has  undertaken 
engagements  which  he  cannot  fulfil  ;  if  the  appearance  of  heads 
is  deferred  even  to  the  100th  toss,  he  will  owe  a  mass  of  silver 
greater  in  bulk  than  the  sun.  But  this  is  no  answer.  Peter  has 
promised  much  and  a  belief  in  his  solvency  will  strain  our  imagina 
tion  ;  but  it  is  imaginable.  And  in  any  case,  as  Bertram!  points 
out,  we  may  suppose  the  stakes  to  be,  not  shillings,  but  grains  of 
sand  or  molecules  of  hydrogen. 

D'Alembert's  principal  explanations  are,  first,  that  true  ex 
pectation  is  not  necessarily  the  product  of  probability  and 
profit  (a  view  which  has  been  discussed  above),  and  second,  that 
very  long  runs  are  not  only  very  improbable,  but  do  not  occur 
at  all. 

The  next  type  of  solution  is  due,  in  the  first  instance,  to  Daniel 
Bernoulli,  and  turns  on  the  fact  that  no  one  but  a  miser  regards 
the  desirability  of  different  sums  of  money  as  directly  proportional 
to  their  amount  ;  as  Button  says,  "  L'avare  est  comme  le 
mathe'maticien  :  tous  deux  estiment  1'argent  par  sa  quantite" 
nunierique."  Daniel  Bernoulli  deduced  a  formula  from  the 
assumption  that  the  importance  of  an  increment  Ls  inversely 
proportional  to  the  size  of  the  fortune  to  which  it  Ls  added. 
Thus,  if  x  is  the  '  physical  '  fortune  and  y  the  *  moral  '  fortune, 

dx 
dij   -k> 


or  ?/  -/clog    ,  when?  k  and  a  are  constants. 

On  the  basis  of  this  formula  of   Bernoulli's  u  considerable 


318  A  TREATISE  ON  PROBABILITY  FT.  iv 

theory  has  been  built  up  both  by  Bernoulli  1  himself  and  by 
Laplace.2     It  leads  easily  to  the  further  formula  — 


where  a  is  the  initial  '  physical  '  fortune,  p1}  etc.,  the  probabilities 
of  obtaining  increments  xl3  etc.,  to  a,  and  x  the  '  physical  '  fortune 
whose  present  possession  would  yield  the  same  '  moral  '  fortune 
as  does  the  expectation  of  the  various  increments  ccl5  etc.  By 
means  of  this  formula  Bernoulli  shows  that  a  man  whose  fortune 
is  £1000  may  reasonably  pay  a  £6  stake  in  order  to  play  the 
Petersburg  game  with  £1  units.  Bernoulli  also  mentions  two 
solutions  proposed  by  Cramer.  In  the  first  all  sums  greater 
than  224  (16,777,116)  are  regarded  as  '  morally  '  equal  ;  this 
leads  to  £13  as  the  fair  stake.  According  to  the  other  formula 
the  pleasure  derivable  from  a  sum  of  money  varies  as  the  square 
root  of  the  sum  ;  this  leads  to  £2  :  9s.  as  the  fair  stake.  But 
little  object  is  served  by  following  out  these  arbitrary  hypotheses. 

As  a  solution  of  the  Petersburg  problem  this  line  of  thought 
is  only  partially  successful  :  if  increases  of  '  physical  '  fortune 
beyond  a  certain  finite  limit  can  be  regarded  as  '  morally  ' 
negligible,  Peter's  claim  for  an  infinite  initial  stake  from  Paul  is, 
it  is  true,  no  longer  equitable,  but  with  any  reasonable  law  of 
diminution  for  successive  increments  Paul's  stake  will  still  remain 
paradoxically  large.  Daniel  Bernoulli's  suggestion  is,  however, 
of  considerable  historical  interest  as  being  the  first  explicit 
attempt  to  take  account  of  the  important  conception  known  to 
modern  economists  as  the  diminishing  marginal  utility  of  money, 
—  a  conception  on  which  many  important  arguments  are  founded 
relating  to  taxation  and  the  ideal  distribution  of  wealth. 

Each  of  the  above  solutions  probably  contains  a  part  of  the 
psychological  explanation.  We  are  unwilling  to  be  Paul,  partly 
because  we  do  not  believe  Peter  will  pay  us  if  we  have  good 
fortune  in  the  tossing,  partly  because  we  do  not  know  what  we 
should  do  with  so  much  money  or  sand  or  hydrogen  if  we  won  it, 
partly  because  we  do  not  believe  we  ever  should  win  it,  and 
partly  because  we  do  not  think  it  would  be  a  rational  act  to  risk 

1  "  Specimen  Theoriae  Novae  de    Mensura   Sortis,"   Connn.   Acad.   Petrop. 
vol.  v.  for  1730  and  1731,  pp.  175-192  (published  1738).     See  Todhunter,  pp. 
213  et  seq. 

2  Theorie  analytique,  chap.  x.  "  De  1'esperance  morale,"  pp.  432-445. 
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an  infinite  sum  or  even  a  very  large  finite  sura  for  an  infinitely 
larger  one,  whose  attainment  is  infinitely  unlikely. 

When  we  have  made  the  proper  hypotheses  and  have  elimin 
ated  these  elements  of  psychological  doubt,  the  theoretic  dispersal 
of  what  element  of  paradox  remains  must  be  brought  about,  I 
think,  by  a  development  of  the  theory  of  risk.  It  is  primarily 
the  great  risk  of  the,  wager  which  deters  us.  Even  in  the  case 
where  the  number  of  tosses  is  in  no  case  to  exceed  a  finite  number, 
the  risk  R,  as  already  defined,  may  be  very  great,  and  the  relative 

risk  r^  will  be  almost  unitv.     Where  there  is  no  limit  to  the 
Hi 

number  of  tosses,  the  risk  is  infinite.  A  relative;  risk,  which 
approaches  unity,  may,  it  has  been  already  suggested,  be  a  factor 
which  must  be  taken  into  account  in  ethical  calculation. 

10.  In  establishing  the  doctrine,  that  all  private  gambling 
must  be  with  certainty  a  losing  game,  precisely  contrary  argu 
ments  are  employed  to  those  which  do  service  in  the  Petersburg 
problem.  The  argument  that  "  you  must  lose  if  only  you  go  on 
long  enough  "  is  well  known.  It  is  succinctly  put  by  Laurent :  1 
Two  players  A  and  B  have;  a  and  b  francs  respectively.  f(a)  is 

the   chance   that  A  will  be  ruined.     Thus  f(<t)  =         ,2  so   that 

a  +  b 

the  poorer  a  gambler  is,  relatively  to  his  opponent,  the  more 
likely  he  is  to  be  ruined.  Hut  further,  if  b  •&  ,  f(a)  =  1,  i.e.  ruin 
is  certain.  The  infinitely  rich  gambler  is  the  public.  It  Ls  against 
the  public  that  the  professional  gambler  plays,  and  his  ruin  Ls 
therefore  certain. 

Might  not  PoLsson  and  Condorcet  reply,  The  conditions  of 
the  game  imply  contradiction,  for  no  gambler  plays,  as  this  argu 
ment  supposes,  for  ever  ?  3  At  the  end  of  &ny  finite  quantity  of 
play,  the  player,  even  if  he  Ls  not  the.  public,  nunj  linLsh  with 
winnings  of  any  finite  size.  The  gambler  is  in  a  worse  position  if 
his  capital  is  smaller  than  his  opponents' — at  poker,  for  instance, 
or  on  the  Stock  Exchange.  This  is  clear.  But  our  desire  for 
moral  improvement  outstrips  our  logic  if  we  tell  him  that  he 
must  lose.  Besides  it  Ls  paradoxical  to  say  that  everybody 

1  Calcul  d»«  jjrutjiihiltt**,  p.   lli'J. 

*  This  would  possibly  follow  from  the  theorem  of   Daniel   Bernoulli.     The 
reasoning  l>v  which  I^iurcnt  ohtaiiiH  it  HCCIUH  to  In-  the  rehtilt  of  a  miht.ike. 

•  Cf.  iilho  Mr.  Bradley,  Lutjic,  p.  217. 
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individually  must  lose  and  that  everybody  collectively  must  win. 
For  every  -individual  gambler  who  loses  there  is  an  individual 
gambler  or  syndicate  of  gamblers  who  win.  The  true  moral  is 
this,  that  poor  men  should  not  gamble  and  that  millionaires 
should  do  nothing  else.  But  millionaires  gain  nothing  by  gam 
bling  with  one  another,  and  until  the  poor  man  departs  from  the 
path  of  prudence  the  millionaire  does  not  find  his  opportunity. 
If  it  be  replied  that  in  fact  most  millionaires  are  men  originally 
poor  who  departed  from  the  path  of  prudence,  it  must  be 
admitted  that  the  poor  man  is  not  doomed  with  certainty. 
Thus  the  philosopher  must  draw  what  comfort  he  can  from  the 
conclusion  with  which  his  theory  furnishes  him,  that  million 
aires  are  often  fortunate  fools  who  have  thriven  on  unfortunate 


ones. 


11.  In  conclusion  we  may  discuss  a  little  further  the  concep 
tion  of '  moral '  risk,  raised  in  §  8  and  at  the  end  of  §  9.  Bernoulli's 
formula  crystallises  the  undoubted  truth  that  the  value  of  a  sum 
of  money  to  a  man  varies  according  to  the  amount  he  already 
possesses.  But  does  the  value  of  an  amount  of  goodness  also 
vary  in  this  way  ?  May  it  not  be  true  that  the  addition  of  a  given 
good  to  a  man  who  already  enjoys  much  good  is  less  good  than 
its  bestowal  on  a  man  who  has  little  ?  If  this  is  the  case,  it 
follows  that  a  smaller  but  relatively  certain  good  is  better  than 
a  greater  but  proportionately  more  uncertain  good. 

In  order  to  assert  this,  we  have  only  to  accept  a  particular 
theory  of  organic  goodness,  applications  of  which  are  common 
enough  in  the  mouths  of  political  philosophers.  It  is  at  the  root 
of  all  principles  of  equality,  which  do  not  arise  out  of  an  assumed 
diminishing  marginal  utility  of  money.  It  is  behind  the  numerous 
arguments  that  an  equal  distribution  of  benefits  is  better  than  a 
very  unequal  distribution.  If  this  is  the  case,  it  follows  that,  the 
sum  of  the  goods  of  all  parts  of  a  community  taken  together 
being  fixed,  the  organic  good  of  the  whole  is  greater  the  more 
equally  the  benefits  are  divided  amongst  the  individuals.  If  the 
doctrine  is  to  be  accepted,  moral  risks,  like  financial  risks,  must 
not  be  undertaken  unless  they  promise  a  profit  actuarially. 

1  From  the  social  point  of  view,  however,  this  moral  against  gambling  may 
be  drawn — that  those  who  start  with  the  largest  initial  fortunes  are  most  likely 
to  win,  and  that  a  given  increment  to  the  wealth  of  these  benefits  them,  on  the 
assumption  of  a  diminishing  marginal  utility  of  money,  less  than  it  injures  those 
from  whom  it  is  taken. 
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There  is  a  great  deal  which  could  bo  said  concerning  such  a 
doctrine,  but  ic  would  lead  too  far  from  what  is  relevant  to  the 
study  of  Probability.  One  or  two  instances  of  its  use.  however, 
may  be  taken  from  the  literature  of  Probability.  In  his  essay, 
"  Sur  1'application  du  ralcul  des  probabilites  a  1'inoculation  de 
la  petite  verole,"  '  D'Alembert  points  out  that  the  community 
would  gain  on  the  average  if,  by  sacrificing  the  lives  of  one  in  five 
of  its  citizens,  it  could  ensuiv  the  health  of  the  rest,  but  he  argues 
that  no  legislator  could  have  the  right  to  order  such  a  sacrifice. 
Galton,  in  his  Probability,  the  Foundation  of  Euyenics,  employed 
an  argument  which  depends  essentially  on  the  same  point. 
Suppose  that  the  members  of  a  certain  class  cause  an  average 
detriment  M  to  society,  and  that  the  mischiefs  done  by  the 
several  individuals  differ  more  or  less  from  M  by  amounts  whose 
average  is  1),  so  that  I)  Ls  the  average  amount  of  the  individual 
deviations,  all  regarded  as  positive,  from  M  ;  then,  Galton  argued, 
the  smaller  D  is,  the  stronger  is  the  justification  for  taking  such 
drastic  measures  against  the  propagation  of  the  class  as  would 
be  consonant  to  the  feelings,  if  it  were  known  that  each  individual 
member  caused  a  detriment  M.  The  use.  of  such  arguments 
seems  to  involve  a  qualification  of  the,  simple  ethical  doctrine 
that  right  action  should  make  the  sum  ol  the  benefits  of  tin- 
several  individual  consequences,  each  multiplied  by  its  prob 
ability,  a  maximum. 

On  the  other  hand,  the  opposite  view  is  taken  in  the  Port  Royal 
Logic  and  by  Butler,  when  they  argue  that  everything  ought  to 
be  sacrificed  for  the  hope  of  heaven,  even  if  its  attainment  be 
thought  infinitely  improbable,  since  "  the  smallest  degree  of 
facility  for  the  attainment  of  salvation  Is  of  higher  value  than 
all  the  blessings  of  the  world  put  together."  '2  The  argument  is, 
that  we  ought  to  follow  a  course  of  conduct  which  may  with  the 
slightest  probability  lead  to  an  infinite  good,  until  it  is  logically 
disproved  that  such  a  result  of  our  action  in  impossible.  The 
Emperor  who  embraced  the  Roman  Catholic  religion,  not  because 

1   Opuscule.*  mafheinatiffufji,  vol.  ii. 

•  J'orl  Itoi/'il  !.<"jic  (Kng.  tnins.),  j>.  :$<}«>  :  "  It  belongs  to  infinite  things  alone, 
aa  eternity  and  salvation,  that  they  cannot  !K«  equalled  by  any  temporal  advan 
tage;  and  thus  we  out-lit  never  to  place  t IK-HI  in  th<-  balance  with  any  of  the 
things  of  the  world.  Thi.->  in  why  the  smallest  degree  of  facility  for  the  attain 
ment  of  salvation  is  of  higher  value  than  all  the  blctviingH  of  the  world  put 
together.  . 
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he  believed  it,  but  because  it  offered  insurance  against  a  disaster 
whose  future  occurrence,  however  improbable,  he  could  not 
certainly  disprove,  may  not  have  considered,  however,  whether 
the  product  of  an  infinitesimal  probability  and  an  infinite  good 
might  not  lead  to  a  finite  or  infinitesimal  result.  In  any  case  the 
argument  does  not  enable  us  to  choose  between  different  courses 

o 

of  conduct,  unless  we  have  reason  to  suppose  that  one  path  is 
more  likely  than  another  to  lead  to  infinite  good. 

12.  In  estimating  the  risk,  '  moral '  or  '  physical,'  it  must  be 
remembered   that   we   cannot  necessarily   apply   to   individual 
cases  results  drawn  from  the  observation  of  a  long  series  re 
sembling  them  in  some  particular.     I  am  thinking  of  such  argu 
ments   as   Bufion's   when   he  names   1(J]m  as  the  limit,  beyond 
which  probability  is  negligible,  on  the  ground  that,  being  the 
chance  that  a  man  of  fifty-six  taken  at  random  will  die  within  a 
day,  it  is  practically  disregarded  by  a  man  of  fifty-six  who  knows 
his  health  to  be  good.     "  If  a  public  lottery,"  Gibbon  truly  pointed 
out,  "  were  drawn  for  the  choice  of  an  immediate  victim,  and  if 
our  name  were  inscribed  on  one  of  the  ten  thousand  tickets, 
should  we  be  perfectly  easy  ?  " 

Bernoulli's  second  axiom,1  that  in  reckoning  a  probability 
we  must  take  everything  into  account,  is  easily  forgotten  in  these 
cases  of  statistical  probabilities.  The  statistical  result  is  so 
attractive  in  its  definiteness  that  it  leads  us  to  forget  the  more 
vague  though  more  important  considerations  which  may  be,  in  a 
given  particular  case,  within  our  knowledge.  To  a  stranger  the 
probability  that  I  shall  send  a  letter  to  the  post  unstamped  may 
be  derived  from  the  statistics  of  the  Post  Office  ;  for  me  those 
figures  would  have  but  the  slightest  bearing  upon  the  question. 

13.  It  has  been  pointed  out  already  that  no  knowledge  of 
probabilities,  less  in  degree  than  certainty,  helps  us  to  know  what 
conclusions  are  true,  and  that  there  is  no  direct  relation  between 
the  truth  of  a  proposition  and  its  probability.     Probability  begins 
and    ends    with    probability.     That    a    scientific    investigation 
pursued  on  account  of  its  probability  will  generally  lead  to  truth, 
rather  than  falsehood,  is  at  the  best  only  probable.     The  pro 
position  that  a  course  of  action  guided  by  the  most  probable 
considerations  will  generally  lead  to  success,  is  not  certainly  true 
and  has  nothing  to  recommend  it  but  its  probability. 
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The  importance  of  probability  can  only  be  derived  from  the 
judgment  that  it  is  rational  to  be  guided  by  it  in  action  ;  and  a 
practical  dependence  on  it  can  only  be  justified  by  a  judgment 
that  in  action  we  omjht  to  act  to  take  some  account  of  it.  It  is 
for  this  reason  that  probability  is  to  us  the  ';  guide  of  life,"  since 
to  us,  as  Locke  says,  "  in  the  greatest  part  of  our  concernment, 
God  has  afforded  only  the  Twilight,  as  I  may  so  say,  of  Prob 
ability,  suitable,  1  presume,  to  that  state  of  Mediocritv  and 
Probationership  H«>  has  been  pleased  to  place  us  in  here." 


PA11T    V 

TIIK    FOUNDATIONS  OK  STATISTICAL 
INFERENCE 


CHAPTER   XXVII 

THE    NATURE    OF    STATISTICAL    INFERENCE 

1.  THE  Theory  of  Statistics,  as  it  is  now  understood,1  can  bo 
divided  into  two  parts  which  are  for  many  purposes  bettor  kept 
distinct.  The  first  function  of  tho  theory  is  purely  descriptive. 
It  devises  numerical  and  diagrammatic  methods  by  which  certain 
salient  characteristics  of  large  groups  of  phenomena  can  be  briefly 
described  ;  and  it  provides  formulae  by  the  aid  of  which  we  can 
measure  or  summarise  the  variations  in  some  particular  character 
which  we  have  observed  over  a  long  series  of  events  or  instances. 
The  second  function  of  the;  theory  is  induct  ice.  It  seeks  to  extend 
its  description  of  certain  characteristics  of  observed  events  to 
the  corresponding  characteristics  of  other  events  which  have  not 
been  observed.  This  part  of  the  subject  may  be  railed  the 
Theory  of  Statistical  Inference  ;  and  it  is  this  which  is  closely 
bound  up  with  the  theory  of  probability. 

2.  The  union  of  these  two  distinct  theories  in  a  single  science 
is  natural.  If,  as  is  generally  the  case,  the  development  of 
some  inductive  conclusion  which  shall  go  beyond  the  actually 
observed  instances  is  our  ultimate  object,  we  naturally  choose 
those  modes  of  description,  while  we  are  engaged  in  our  pre 
liminary  investigation,  which  are  most  capable  of  extension 
beyond  the  particular  instances  which  they  primarily  describe. 
But  this  union  is  also  the  occasion  of  a  great  deal  of  confusion.  The 
statistician,  who  is  mainly  interested  in  the  technical  methods  ol 
his  science,  is  less  concerned  to  discover  the  precise  conditions  in 
which  a  description  can  be  legitimately  extended  by  induction. 
He  slips  somewhat  easily  from  one  to  the  other,  and  having 
found  a  complete  and  satisfactory  mode  of  description  he 

*  Set-  Yule,  / ntrodttrttun  to  Statistic*,  pp.  1 -f),  f«.r  Ji  vi-ry  int«-rc8tiii«  ucrounl 
of  tho  evolution  of  tin-  nn«iuutig  <'f  tho  t.-rm  *t«t\*tir*. 
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may  take  less  pains  over  the  transitional  argument,  which  is 
to  permit  him  to  use  this  description  for  the  purposes  of 
generalisation. 

One  or  two  examples  will  show  how  easy  it  is  to  slip  from 
description  into  generalisation.  Suppose  that  we  have  a  series 
of  similar  objects  one  of  the  characteristics  of  which  is  under 
observation  ;— a  number  of  persons,  for  example,  whose  age  at 
death  has  been  recorded.  We  note  the  proportion  who  die  at 
each  age,  and  plot  a  diagram  which  displays  these  facts  graphic 
ally.  We  then  determine  by  some  method  of  curve  fitting  a 
mathematical  frequency  curve  which  passes  with  close  approxima 
tion  through  the  points  of  our  diagram.  If  we  are  given  the 
equation  to  this  curve,  the  number  of  persons  who  are  comprised 
in  the  statistical  scries,  and  the  degree  of  approximation  (whether 
to  the  nearest  year  or  month)  with  which  the  actual  age  has  been 
recorded,  we  have  a  very  complete  and  succinct  account  of  one 
particular  characteristic  of  what  may  constitute  a  very  large 
mass  of  individual  records.  In  providing  this  comprehensive 
description  the  statistician  has  fulfilled  his  first  function.  But  in 
determining  the  accuracy  with  which  this  frequency  curve  can  be 
employed  to  determine  the  probability  of  death  at  a  given  age 
in  the  population  at  large,  he  must  pay  attention  to  a  new  class 
of  considerations  and  must  display  a  different  kind  of  capacity. 
He  must  take  account  of  whatever  extraneous  knowledge  may  be 
available  regarding  the  sample  of  the  population  which  came 
under  observation,  and  of  the  mode  and  conditions  of  the  observa 
tions  themselves.  Much  of  this  may  be  of  a  vague  kind,  and  most 
of  it  will  be  necessarily  incapable  of  exact,  numerical,  or  statistical 
treatment.  He  is  faced,  in  fact,  with  the  normal  problems  of 
inductive  science,  one  of  the  data,  which  must  be  taken  into 
account,  being  given  in  a  convenient  and  manageable  form  by 
the  methods  of  descriptive  statistics. 

Or  suppose,  again,  that  we  are  given,  over  a  series  of  years, 
the  marriage  rate  and  the  output  of  the  harvest  in  a  certain  area 
of  population.  We  wish  to  determine  whether  there  is  any 
apparent  degree  of  correspondence  between  the  variations  of  the 
two  within  this  field  of  observation.  It  is  technically  difficult  to 
measure  such  degree  of  correspondence  as  may  appear  to  exist 
between  the  variations  in  two  series,  the  terms  of  which  are  in 
some  manner  associated  in  couples, — by  coincidence,  in  this  case, 
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of  time  and  place.  By  the  method  of  correlation  tables  and 
correlation  coefficients  the  descriptive  statistician  is  able  to  effect 
this  object,  and  to  present  the  inductive  scientist  with  a  highly 
significant  part  of  his  data  in  a  compact  and  instructive  form. 
But  the  statistician  has  not,  in  calculating  these  coefficients  of 
observed  correlation,  covered  the  whole  ground  of  which  the  in 
ductive  scientist  must  take  cognisance.  He  has  recorded  the 
results  of  the  observations  in  circumstances  where  they  cannot 
be  recorded  so  clearly  without  the  aid  of  technical  methods  ;  but 
the  precise  nature  of  the  conditions  in  which  the  observations 
took  place  and  the  numerous  other  considerations  of  one  sort  or 
another,  of  which  we  must  take  account  when  we  wish  to 
generalise,  are  not  usually  susceptible  of  numerical  or  statistical 
expression. 

The  truth  of  this  is  obvious  ;  yet,  not  unnaturally,  the  more 
complicated  and  technical  the  preliminary  statistical  investigations 
become,  the  more  prone  inquirers  are  to  mistake  the  statistical 
description  lor  an  inductive  generalisation.1  This  tendency, 
which  has  existed  in  some,  degree,  as,  I  think,  the  whole  history  of 
the  subject  shows,  from  tin;  eighteenth  century  down  to  ti:<1 
present  time,  has  been  further  encouraged  by  the  terminology  in 
ordinary  use-.  For  several  statistical  coefficients  are  given  the 
same  name  when  they  are  used  for  purely  descriptive  purposes, 
as  when  corresponding  coefficients  are,  used  to  measure  the  force 
or  the  precision  of  an  induction.  The  term  '  probable  error,' 
for  example,  is  used  both  for  the  purpose  of  supplement 
ing  and  improving  a  statistical  description,  find  for  the 
purpose  of  indicating  the  precision  of  some  generalisation. 
The  term  '  correlation  '  itself  is  used  both  to  describe  an 
observed  characteristic  of  particular  phenomena  awl  in  the 
enunciation  of  an  inductive  law  which  relates  to  phenomena 
in  general. 

3.  I  have  been  at  pains  to  enforce  this  contrast  between 
statistical  description  and  statistical  induction,  because  the 
chapters  which  follow  an-,  to  be  entirely  about  the  latter,  whereas 
nearly  all  statistical  treatises  are  mainly  concerned  with  the 
former.  My  object  will  be  to  analyse,  so  far  as  I  can.  the  logical 

1  Cf.  \Vhjt«-lH-afi,  Iiitrmluction  t»  Mat/if  mnttc*.  p.  27:  "  Then-  is  n»  "ion- 
common  error  than  to  us-.sumc  that.  hfoaiiHO  prolonj/c.!  and  a<-<  n:at<-  mat  ho- 
matical  calculations  have  l»e«-n  in.-ulr,  the  application  of  the-  result  to  some  fad 
of  nature  i.s  absolutely  certain." 
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basis  of  statistical  modes  of  argument.  This  involves  a  double 
task.  To  mark  down  those  which  are  invalid  amongst  argu 
ments  having  the  support  of  authority  is  relatively  easy. 
The  other  branch  of  our  investigation,  namely,  to  analyse 
the  ground  of  validity  in  the  case  of  those  arguments  the 
force  of  which  all  of  us  do  in  fact  admit,  presents  the  same 
kind  of  fundamental  difficulties  as  we  met  with  in  the  case 
of  Induction.  • 

4.  The  arguments  with  which  we  have  to  deal  fall  into  three 
main  classes  : 

(i.)  Given  the  probability  relative  to  certain  evidence  of  each 
of  a  series  of  events,  what  are  the  probabilities,  relative  to  the 
same  evidence,  of  various  proportionate  frequencies  of  occurrence 
for  the  events  over  the  whole  series  ?  Or  more  briefly,  how  often 
may  we  expect  an  event  to  happen  over  a  series  of  occasions,  given 
its  probability  on  each  occasion  ? 

(ii.)  Given  the  frequency  with  which  an  event  has  occurred 
on  a  series  of  occasions,  with  what  probability  may  we  expect  it 
on  a  further  occasion  ? 

(iii.)  Given  the  frequency  with  which  an  event  has  occurred 
on  a  series  of  occasions,  with  what  frequency  may  we  probably 
expect  it  on  a  further  series  of  occasions  ? 

In  the  first  type  of  argument  we  seek  to  infer  an  unknown 
statistical  frequency  from  an  d  priori  probability.  In  the  second 
type  we  are  engaged  on  the  inverse  operation,  and  seek  to  base 
the  calculation  of  a  probability  on  an  observed  statistical  fre 
quency.  In  the  third  type  we  seek  to  pass  from  an  observed 
statistical  frequency,  not  merely  to  the  probability  of  an  individual 
occurrence,  but  to  the  probable  values  of  other  unknown  statistical 
frequencies. 

Each  of  these  types  of  argument  can  be  further  complicated 
by  being  applied  not  simply  to  the  occurrence  of  a  simple  event 
but  to  the  concurrence  under  given  conditions  of  two  or  more 
events.  When  this  two  or  more  dimensional  classification  re 
places  the  one  dimensional,  the  theory  becomes  what  is  some 
times  termed  Correlation,  as  distinguished  from  simple  Statis 
tical  Frequency. 

5.  In  Chapter  XXVIII.  I  touch   briefly  on   the  observed 
phenomena  which   have   given    rise    to  the   so-called   Law   of 
Great  Numbers,  and  the  discovery  of  which  first  set  statistical 
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investigation  going.  In  Chapter  XXIX.  the  first  type  of  argu 
ment,  as  classified  above,  is  analysed,  and  the  conditions  which 
are  required  for  its  validity  are  stated.  The  crucial  problem 
of  attacking  the  second  and  third  types  of  argument  is  the 
subject  of  my  concluding  chapters. 


CHAPTER  XXVIII 

THE   LAW   OP   GREAT  NUMBERS 

Natura  quidem  suas  liabet  consuetudines,  natas  ex  reditu  causarum,  sed  non 
nisi  W9  tirl  TO  TToXi'.  Novi  morbi  immdant  subinde  liumanum  genus,  quodsi 
ergo  de  inortibus  quotcunque  experinienta  feccris,  non  idco  naturae  rerum  limites 
posuisti,  ut  pro  future  variare  non  possit. — LEIBNIZ  in  a  letter  to  Bernoulli, 
December  3,  1703. 

1.  IT  has  always  been  known  that,  while  some  sets  of  events 
invariably  happen  together,  other  sets  generally  happen  together. 
That  experience  shows  one  thing,  while  not  always  a  sign  of 
another,  to  be  a  usual  or  probable  sign  of  it,  must  have  been  one 
of  the  earliest  and  most  primitive  forms  of  knowledge.  If  a  dog 
is  generally  given  scraps  at  table,  that  is  sufficient  for  him  to  judge 
it  reasonable  to  be  there.  But  this  kind  of  knowledge  was  slow 
to  be  made  precise.  Numerous  experiments  must  be  carefully 
recorded  before  we  can  know  at  all  accurately  how  usual  the 
association  is.  It  would  take  a  dog  a  long  time  to  find  out  that 
he  was  given  scraps  except  on  fast  days,  and  that  there  was  the 
same  number  of  these  in  every  year. 

The  necessary  kind  of  knowledge  began  to  be  accumulated 
during  the  seventeenth  and  eighteenth  centuries  by  the  early 
statisticians.  Halley  and  others  began  to  construct  mortality 
tables  ;  the  proportion  of  the  births  of  each  sex  were  tabulated  ; 
and  so  forth.  These  investigations  brought  to  light  a  new  fact 
which  had  not  been  suspected  previously — namely,  that  in  certain 
cases  of  partial  association  the  degree  of  association,  i.e.  the  pro 
portion  of  instances  in  which  it  existed,  shows  a  very  surprising 
regularity,  and  that  this  regularity  becomes  more  marked  the 
greater  the  number  of  the  instances  under  consideration.  It  was 
found,  for  example,  not  merely  that  boys  and  girls  are  born  on 
the  whole  in  about  equal  proportions,  but  that  the  proportion, 
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which  is  not  one  of  complete  equality,  tends  everywhere,  when 
the  number  of  recorded  instances  becomes  large,  to  approximate 
towards  a  certain  definite  figure. 

During  the  eighteenth  century  matters  were,  not  pushed  much 
further  than  this,  that  in  certain  cases,  of  which  comparatively 
few  were  known,  there  was  this  surprising  regularity,  increasing 
in  degree  as  the  instances  became  more  numerous.  Bernoulli, 
however,  took  the  first  step  towards  giving  it  a  theoretical  basis 
by  showing  that,  if  the  d  priori  probability  is  known  throughout, 
then  (subject  to  certain  conditions  which  he  himself  did  not  make 
clear)  in  the  long  run  a  certain  determinate  frequency  of  occurrence 
is  to  be  expected.  Sussmilch  (Die  gottliche  Ordnung  in  dot 
Verdnderungen  a  s  menschlichm  Gesclikchts,  1711)  discovered  a 
theological  interest  in  these  regularities.  Such  ideas  had  become 
sufficiently  familiar  for  Clibbon  to  characterise  the  results  of 
probability  as  "  so  true  in  general,  so  fallacious  in  particular." 
Kant  found  in  them  (as  many  later  writers  have  done)  some 
bearing  on  the  problem  of  Free  Will.1 

But  with  the  nineteenth  century  came  bolder  theoretical 
methods  and  a  wider  knowledge  of  facts.  After  proving  his 
extension  of  Bernoulli's  Theorem,2  Poisson  applied  it  to  the 
observed  facts,  and  gave  to  the  principle  underlying  these 
regularities  the  title  of  the  Law  of  (1  rent  Number*.  "  Les  choses 
de  toutes  natures,"  he  wrote,3  "  sont  soumises  a  une  loi  univer- 
selle  qu'on  peut  appeler  la  loi  des  grands  nombres.  .  .  .  De  ces 
exemples  de  toutes  natures,  il  r&sulte  que  la  loi  universelle  des 
grands  nombres  est  d6ja  pour  nous  un  fait  general  et  incontestable, 
resultant  d'experiences  qui  ne  se  d&nentent  jamais."  This  is 
the  language  of  exaggeration  ;  it  is  also  extremely  vague.  But 

1  In  Idee  za  einer  aUgtmeine.n  Qcxchiclite.  in  tceltbiinjerlichi  r  Absicht,  17H-J.    For 
a  discussion  of  this  passage  un<l  for  tho  connection  between  Kant  and  Suhsmilch, 
seo  Lottin's  Quftelct,  pp.  3(>7,  368. 

2  See  p.  345. 

3  Iter/terchts,  pp.  7-12.      Von  Jiortkiewiry.  (Kriti#rhc.   Hctraclilitnycn,  1st  part, 
pp.  655-660)  has  maintained  that  Poisson  intended  to  Matt-  his  principle  in  a 
less  [general  way  than  that  in  which  it  has  been  generally  taken,  and  that  hi-  wan 
misunderstood  by  Quetclot  and  others.     If  we  attend  only  to  Poisson's  con 
tributions  to  Complex  Rcndus  in  1.S35  and  1836  and  to  the  examples  ho  Rives 
there,  it  ia  possible  to  make  out  a  pood  case  for  thinking  that  ho  intended  his 
law  to  extend  only  to  cases  where  certain  strict  conditions  were  fulfilled.      Hut 
this  i.s  not  the  spirit  of  his  more  popular  writings  or  of  the  passage  quoted  above. 
At  any  rate,  it  is  tho  fashion,  in  which  Poisson  influenced  his  contemporaries, 
that  is  historically  interesting  ;    and  thin  is  certainly  not  represented  by  Von 
Bortkiewicz's  inlerpietntion. 
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it  is  exciting  ;  it  seems  to  open  up  a  whole  new  field  to  scientific 
investigation  ;  and  it  has  had  a  great  influence  on  subsequent 
thought.  Poisson  seems  to  claim  that,  in  the  whole  field  of  chance 
and  variable  occurrence,  there  really  exists,  amidst  the  apparent 
disorder,  a  discoverable  system.  Constant  causes  are  always 
at  work  and  assert  themselves  in  the  long  run,  so  that  each  class 
of  event  does  eventually  occur  in  a  definite  proportion  of  cases. 
It  is  not  clear  how  far  Poisson's  result  is  due  to  d  priori  reasoning, 
and  how  far  it  is  a  natural  law  based  on  experience  ;  but  it  is 
represented  as  displaying  a  certain  harmony  between  natural 
law  and  the  d  priori  reasoning  of  probabilities. 

Poisson's  conception  was  mainly  popularised  through  the 
writings  of  Quetelet.  In  1823  Quetelet  visited  Paris  on  an 
astronomical  errand,  where  he  was  introduced  to  Laplace  and 
came  into  touch  with  "  la  grande  ecole  frangaise."  "  Ma  jeunesse 
et  mon  zele,"  he  wrote  in  later  years,  "  ne  tarderent  pas  a  me 
mettre  en  rapport  avec  les  hommes  les  plus  distingues  de  cette 
epoque  ;  qu'on  me  permctte  de  citer  Fourier,  Poisson,  Lacroix, 
specialement  comius,  comme  Laplace,  par  leurs  excellents  ecrits 
sur  la  theorie  mathematique  des  probabilites.  .  .  .  C'est  done 
au  milieu  des  savants,  statisticiens,  et  economistes  de  ce  temps 
que  j'ai  commence  mes  travaux."  *  Shortly  afterwards  began 
his  long  series  of  papers,  extending  down  to  1873,  on  the  applica 
tion  of  Probability  to  social  statistics.  He  wrote  a  text-book 
on  Probability  in  the  form  of  letters  for  the  instruction  of  the 
Prince  Consort. 

Before  accepting  in  1815  at  the  age  of  nineteen  (with  a  view  to 
a  livelihood)  a  professorship  of  mathematics,  Quetelet  had  studied 
as  an  art  student  and  written  poetry  ;  a  year  later  an  opera,  of 
which  he  was  part-author,  was  produced  at  Ghent.  The  character 
of  his  scientific  work  is  in  keeping  with  these  beginnings.  There 
is  scarcely  any  permanent,  accurate  contribution  to  knowledge 
which  can  be  associated  with  his  name.  But  suggestions,  pro 
jects,  far-reaching  ideas  he  could  both  conceive  and  express,  and 
he  has  a  very  fair  claim,  1  think,  to  be  regarded  as  the  parent  of 
modern  statistical  method. 

Quetelet  very  much  increased  the  number  of  instances  of  the 

1  For  the  details  of  the  life  of  Quetelet  and  for  a  very  full  discussion  of  his 
writings  with  special  reference  to  Probability,  see  Lottin's  Qttctclcl,  statitticien  ct 
•yociologue. 
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Law  of  Great  Numbers,  and  also  brought  into  prominence  a 
slightly  variant  type  of  it,  of  which  a  characteristic  example  is 
the  law  of  height,  according  to  which  the  heights  of  any  consider 
able  sample  taken  from  any  population  tend  to  group  themselves 
according  to  a  certain  well-known  curve.  His  instances  were 
chiefly  drawn  from  social  statistics,  and  many  of  them  were  of  a 
kind  well  calculated  to  strike  the,  imagination — the  regularity  of 
the  number  of  suicides,  "  IVflrayante  exactitude  avec  laquelle 
les  crimes  s;>  reproduisent,"  and  so  forth.  Quetelet  writes 
with  an  almost  religious  awe  of  these  mysterious  laws,  and 
certainly  makes  the  mistake  of  treating  them  as  being  as 
adequate  and  complete  in  themselves  as  the  laws  of  physics, 
and  as  little  needing  any  further  analysis  or  explanation.1 
Quetelet's  sensational  language  may  have  given  a  considerable 
impetus  to  the  collection  of  social  statistics,  but  it  also  involved 
statistics  in  a  slight  element  of  suspicion  in  the  minds  of  some 
who,  like  Comte,  regarded  the  application  of  the  mathematical 
calculus  of  probability  to  social  science  as  "  purement  chim6rique 
et,  par  consequent,  tout  a  fait  vicieuse."  The  suspicion  of 
quackery  has  not  yet  disappeared.  Quetelet  belongs,  it  must  be 
admitted,  to  the,  long  line  of  brilliant  writers,  not  yet  extinct,  who 
have  prevented  Probability  from  becoming,  in  the  scientific  salon, 
perfectlv  respectable.  There  is  still  about  it  for  scientists  a 
smack  of  astrology,  of  alchemy. 

The  progress  of  the  conception  since  the  time  of  Quetelet  has 
been  steady  and  uneventful  ;  and  long  strides  towards  this  perfect 
respectability  have  been  taken.  Instances  have  been  multiplied 
and  the  conditions  necessary  for  the  existence  of  statistical 
stability  have  been  to  some  extent  analysed.  While  the  most 
fruitful  applications  of  these  methods  have  stijl  been  perhaps, 
as  at  first,  in  social  statistics  and  in  errors  of  observation,  a 
number  of  uses  for  them  have  been  discovered  in  quite  recent 
times  in  the  other  sciences  ;  and  the  principles  of  Mendelism 
have  opened  out  for  them  a  great  field  of  application  throughout 
biology. 

1  Compare,  for  instance,  the  following  pasna^e  from  Itecherrhcn  anr  le  penchant 
au  critnr  :  "  II  mo  hemble  quc  ce  qui  so  rattache  a  IV.spece  humainc,  con8ide.r6o 
en  masse.,  cst  <!<•  1'ordrc  <!rs  f;iits  physiques;  plus  lc  noinhrc  drs  individus  cst 
grand,  plus  la  volontr  individm-lk-  s'etface,  et  lais.se  pn'dominer  l.i  s.-rio  dca  fails 
g6neraux  qui  d<'-p<-ndrnt  den  causes  ^onerules.  .  .  .  (Jo  Borit  « <-s  causes  qu'il 
s'upit  do  saisir,  ct  d «'•.->  qu'on  les  connaitra,  on  en  determinera  ICH  ••ffeta  |>our  la 
socidte  comim-  on  d^ttTJiiino  les  efTet.s  par  Irs  causes  dans  les  Hcicnces  phyBiques." 
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2.  The  existence  of  numerous  instances  of  the  Law  of  Great 
Numbers,  or  of  something  of  the  kind,  is  absolutely  essential  for 
the  importance  of  Statistical  Induction.  Apart  from  this  the  more 
precise  parts  of  statistics,  the  collection  of  facts  for  the  prediction 
of  future  frequencies  and  associations,  would  be  nearly  useless. 
But  the  '  Law  of  Great  Numbers  '  is  not  at  all  a  good  name  for  the 
principle  which  underlies  Statistical  Induction.  The  '  Stability 
of  Statistical  Frequencies  '  would  be  a  much  better  name  for  it. 
The  former  suggests,  as  perhaps  Poisson  intended  to  suggest,  but 
what  is  certainly  false,  that  every  class  of  event  shows  statistical 
regularity  of  occurrence  if  only  one  takes  a  sufficient  number  of 
instances  of  it.  It  also  encourages  the  method  of  procedure,  by 
which  it  is  thought  legitimate  to  take  any  observed  degree  of 
frequency  or  association,  which  is  shown  in  a  fairly  numerous 
set  of  statistics,  and  to  assume  with  insufficient  investigation 
that,  because  the  statistics  are  numerous,  the  observed  degree  of 
frequency  is  therefore  stable.  Observation  shows  that  some 
statistical  frequencies  are,  within  narrower  or  wider  limits,  stable. 
But  stable  frequencies  are  not  very  common,  and  cannot  be 
assumed  lightly. 

The  gradual  discovery,  that  there  are  certain  classes  of 
phenomena,  in  which,  though  it  is  impossible  to  predict  what  will 
happen  in  each  individual  case,  there  is  nevertheless  a  regularity 
of  occurrence  if  the  phenomena  be  considered  together  in  succes 
sive  sets,  gives  the  clue  to  the  abstract  inquiry  upon  which  we 
are  about  to  embark. 


CHAPTER   XXIX 

THE  USK  OF  A  PRIORI  PROBABILITIES  FOR  THE  PREDICTION  UF 
STATISTICAL  FREQUENCY— THE  THEOREMS  OF  BERNOULLI, 
POISSON,  AND  TCHEBYCHEFF 

Hoc  igitur  cst  illud  I'roblema,  quod  evulKandum  hoc  loco  pr.,p«,sui.  ix-.t- 
quani  jam  per  vicei.ruuni  pressi,  cL  cujus  turn  novitas,  turn  Minima  utilitas  cum 
pan  conjuncta  difficultate  omnibus  reliquis  hujus  doctrmae  cnpitibua  poi.dus 
et  pretium  superaddere  potest.— BERNOULLI.1 

1.  BERNOULLI'S  Theorem  is  generally  regarded  as  the  central 
theorem  of  statistical  probability,  it  embodies  the  first  attempt 
to  deduce  the  measures  of  statistical  frequencies  from  the  measures 
of  individual  probabilities,  and  it  is  a  sufficient  fruit  of  the  twenty 
years  which  Bernoulli  alleges  that  he  spent  in  reaching  his  result, 
if  out  of  it  the  conception  first  arose  of  general  laws  amongst 
masses  of  phenomena,  in  spite  of  the  uncertainty  of  each  parti 
cular  case.  But,  as  we  shall  see,  the  theorem  is  only  valid  subject 
to  stricter  qualifications,  than  have  always  been  remembered, 
and  in  conditions  which  are  the  exception,  not  the  rule. 

The  problem,  to  be  discussed  in  this  chapter,  is  as  follows: 
Given  a  series  of  occasions,  the  probability  2  of  the  occurrence 
of  a  certain  event  at  each  of  which  is  known  relative  to  certain 
initial  data  h,  on  what  proportion  of  these  occasions  mav  we 
reasonably  anticipate  the  occurrence  of  the  event  ?  Uiven,  that 
is  to  say,  the  individual  probability  of  each  of  a  series  of  events 
a  priori,  what  statistical  frequency  of  occurrence  of  these  events 
is  to  be  anticipated  over  the  whole  series  ?  Beginning  with 
Bernoulli's  Theorem,  we  will  consider  the  various  solutions  of 
this  problem  which  have  been  propounded,  and  endeavour  to 

1   Ar»  Conjectandi,  \>.  2J7. 

J   In  the  simplest  i-ow-s,  dealt  with  by  Bernoulli,  these  prubabilitieH  are  all 
Supposed   equal. 
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determine  the  proper  limits  within  which  each  method  has 
validity. 

2.  Bernoulli's  Theorem  in  its  simplest  form  is  as  follows  :  If 
the  probability  of  an  event's  occurrence  under  certain  conditions 
is  p,  then,  if  these  conditions  are  present  on  m  occasions,  the  most 
probable  number  of  the  event's  occurrences  is  mp  (or  the  nearest 
integer  to  this),  i.e.  the  most  probable  proportion  of  its  occurrences 
to  the  total  number  of  occasions  is  p  ;  further,  the  probability 
that  the  proportion  of  the  event's  occurrences  will  diverge  from 
the  most  probable  proportion  p  by  less  than  a  given  amount  6, 
increases  as  m  increases,  the  value  of  this  probability  being 
calculable  by  a  process  of  approximation. 

The  probability  of  the  event's  occurring  n  times  and  failing 
m-n  times  out  of  the  m  occasions  is  (subject  to  certain  conditions 
to  be  elucidated  later)  pnq'"~H  multiplied  by  the  coefficient  of 
this  expression  in  the  expansion  of  (p  +  q)"1,  where  p  +  q  =  l.  If 

we  write  n  =  mp-h,  this  term  is  ,^V"?'-     ^ 

(mp  -  h)  !  (mq  +  h)\ 

is  easily  shown  that  this  is  a  maximum  when  h  =  0,  i.e.  when  n  =  mp 
(or  the  nearest  integer  to  this,  where  mp  is  not  integral).  This 
result  constitutes  the  first  part  of  Bernoulli's  Theorem. 

For  the  second  part  of  the  theorem  some  method  of  approxi 
mation  is  required.  Provided  that  m  is  large,  we  can  simplify 

m  ' 

the  expression  ,  p"q'"  ~  "  by  means  of   Stirling's 

(mp  -  h)  !  (mq  +  h)  I  J 

Theorem,  and  obtain  as  its  approximate  value 


As  before,  this  is  a  maximum  when  7i  =  0,  i.e.  when  n  =  mp. 

It  is  possible,  of  course,  by  more  complicated  formulae  to 
obtain  closer  approximations  than  this.1  But  there  is  an  objec 
tion,  which  can  be  raised  to  this  approximation,  quite  distinct 
from  the  fact  that  it  does  not  furnish  a  result  correct  to  as  many 
places  of  decimals  as  it  might.  This  is,  that  the  approximation 
is  independent  of  the  sign  of  h,  whereas  the  original  expression 
is  not  thus  independent.  That  is  to  say,  the  approximation 
implies  a  symmetrical  distribution  for  different  values  of  h  about 

1  See,  e.g.,  Bowley,  Elements  of  Statistics,  p.  298.  The  objection  about  to 
be  raised  does  not  apply  to  these  closer  approximations. 
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the  value  for  /*=<);  while  the  expression  under  approximation 
is  un  symmetrical.  It  is  easily  seen  that  this  want  of  symmetry 
is  appreciable  unless  mpq  is  large.  We  ought,  therefore,  to  have 
laid  it  down  as  a  condition  of  our  approximation,  not  only  that 
m  must  be  large,  but  also  that  mjyq  must  be  large.  Unlike  most 
of  my  criticisms,  this  is  a  mathematical,  rather  than  a  logical 
point.  I  recur  to  it  in  §  15. 

"  Par  une  fiction  qui  rendra  les  calculs  plus  faeiles  "  (to  quote 
Bertrand),  we  now  replace  the  integer  h  by  a  continuous  variable 
z  and  argue  that  the  probability  that  the  amount  of  the  diverg 
ence  from  the  most  probable  value  w/nvill  lie  between  z  and  z  +dz, 

is 


This  '  fiction  '  will  do  no  harm  so  long  as  it  is  remembered  that  we 
are  now  dealing  with  a  particular  kind  of  approximation.  The 
probability  that  the  divergence  h  from  the  most  probable  value 
Hip  will  be  less  than  some  given  quantity  a  is,  therefore, 

.  I      '    L>'"'"',/:. 

s/-7r////"/j   „ 

If  we  put  -/,  this  is  equal  to 


Thus,    if    we    write    a  -  ^/2w;*y  7,    the    probability 
number  of  occurrences  will  lie  between 

mp  -l  x/2//ip<7  7  and  tup      ^"2mpq  y 

2     i'y 
is  measured  by2  e'r~dt.     This  same   expression  measures 

xX^J  o 

1  The  n-pliu  i-inciit  of  tin)  intrycr  /i  l>y  Lin-  cunt  iniiuii     \  ariablo  :  m.-iy  n-ndcr 
tin-  forriiula  ratlu-r  il«-r«-|»tivi-.      It  is  certain,  fur  cxain|.l.',  that  tin-  i-rr->r  <!<>«•*  imt 
lie  bt'hrren  h  and  //  +  1. 

2  'Hi,-  jil,,,vi-  pr«M'f  f"ll"ws  tin-  general  lin«-4  »i  Hertraiid's  (t'.ilcul  df*  proba 
bility,  chap.  iv.).      S..IIK-  writers,  iihinj?  rath«-r  in»n-  prr,  i.^idi,  KMVO  the  n-.Hii 

•>     t't  p  '(' 

"        e.  ''dt  , 


(c.ij.    Lapla.r,    |,y    tin;    usr    of    Killer's    Tlioort-m,    and     ini.n-    n-rrntly    <'/.U|HT. 


340  A  TREATISE  ON  PROBABILITY  IT.  v 

the    probability    that    the   proportion    of    occurrences    will    lie 
between 


p+  v/       ^7  and  p-  .          *  7. 

in  m 

2     (' 
The  different  values  of  the  integral  e  "dt  =  (-)(t)  are  given 


in  tables.1 

The  probability  that  the  proportion  of  occurrences  will  lie 


between  given  limits  varies  with  the  magnitude  of      /       ,  and 

V   m 

this  expression  is  sometimes  used,  therefore,  to  measure  the 
'  precision  '  of  the  series.  Given  the  d  priori  probabilities,  the 
precision  varies  inversely  with  the  square  root  of  the  number  of 
instances.  Thus,  while  the  probability  that  the  absolute  diverg 
ence  will  be  less  than  a  given  amount  a  decreases,  the  probability 
that  the  corresponding  proportionate  divergence  (i.e.  the  absolute 
divergence  divided  by  the  number  of  instances)  will  be  less  than 
a  given  amount  6,  increases,  as  the  number  of  instances  increases. 
This  completes  the  second  part  of  Bernoulli's  Theorem. 

3.  Bernoulli  himself  was  not  acquainted  with  Stirling's 
theorem,  and  his  proof  differs  a  good  deal  from  the  proof  outlined 
in  §  2.  His  final  enunciation  of  the  theorem  is  as  follows  :  If  in 
each  of  a  given  series  of  experiments  there  are  r  contingencies 
favourable  to  a  given  event  out  of  a  total  number  of  contingencies 

t,  so  that      is  the  probability  of  the  event  at  each  experiment, 

L 

then,  given  any  degree  of  probability  c,  it  is  possible  to  make  such 
a  number  of  experiments  that  the  probability,  that  the  propor 
tionate  number  of  the  event's  occurrences  will  lie  between 

r+l       ,  r-I 

and          ,  is  greater  than  c. 

t  t 


Wahrscheinlichkeitsrechnung,  vol.  i.  p.  121).  As  the  whole  formula  is  approxi 
mate,  the  simpler  expression  given  in  the  text  is  probably  not  loss  satisfactory  in 
practice.  See  also  Czuber,  Eniwicklung,  pp.  7<i.  77,  and  Eggenberger,  Beitrdge 
zur  Darstellung  des  B'jrnouUifichen  Theorem.*. 

1  A  list  of  the  principal  tables  is  given  by  CV.uber,  loc.  cit.  vol.  i.  p.  122. 

2  Am  Conjectandi,  p.  236  (i  have  translated  freely).     There  is  a  brief  account 
of  Bernoulli's  proof  in  Todhunter's  History,  pp.  71,  72.     The  problem  is  dealt 
with  by  Laplace,   Theorie  analytique,   livre  ii.  chap.  iii.     For  an   account  of 
Laplace's  proof  see  Todhunter's  History,  pp.  548-553. 
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4.  We  seem,  therefore,  to  have  proved  that,  if  the  a  priori 
probability  of  an  event  under  certain  conditions  is  p,  the  pro 
portion  of  times  most  probable  a  priori  for  the  event's  occurrence 
on  a  scries  of  occasions  where  the  conditions  are  satisfied  is  also 
;>,  and  that  if  the  series  is  a  long  one  the  proportion  is  very  un 
likely  to  dilTer  widely  from  p.     This  amounts  to  the  principle 
which  Kllis  l  and  Venn  have  employed  as  the  defining  axiom  of 
probability,  save  that  if  the  series  is  '  long  enough  '  the  proportion, 
according  to  them,  will  certainly  be  p.    Laplace  2  believed  that  the 
theorem  afforded  a  demonstration  of  a  general  law  of  nature,  and 
in  his  second  edition  published  in  1814  he  replaces  3  the  eloquent 
dedication,  A   Napoleon-le-Grand,  which  prefaces  the  edition  of 
1812,  by  an  explanation  that  Bernoulli's  Theorem  must  always 
bring  about  the  eventual  downfall  of  a  great  power  which,  drunk 
with  the  love  of  conquest,  aspires  to  a  universal  domination,— 
"  c'est   encore  un  resulta-t  du   calcul  des  probabilites,  confirme 
par  de  nombrenses  et  funestes  experiences." 

5.  Such  is  the  famous  Theorem  of  Bernoulli  which  some  have 
believed  4  to  have  a  universal  validity  and  to  be  applicable  to  all 
'  properly  calculated  '  probabilities.     Yet  the  theorem  exhibits 
algebraical  rather  than  logical  insight.     And,  for  reasons  about 
to  be  given,  it  will  have  to  be  conceded  that  it  is  only  true  of  a 
special   class  of  cases  and   requires  conditions,  before  it  can  be 
legitimately  applied,  of   which  the   fulfilment    is   rather  the  ex 
ception  than  the  rule.      For  consider  the  case  of  a  coin  of  which 
it  is  given  that  the  two  faces  are  either  both  heads  or  both  tails  : 
at  every  toss,  provided   that  the   results  of  the  other  tosses  are 
unknown,  the  probability  of  heads  is  £  and    the  probability  of 
tails  is  i  ;   yet  the  probability  of  )H  heads  and  m  tails  in  2m  tosses 

1  On  the.  Foundation  oj  th<  'J'}«»n/  «f  I'rolxihiliti'*  :  "  If  the  probability  of  a 
piven  event  be  correctly  determined,  the  event  will  on  a  lon^r  run  of  trials  tend 
to  recur  with  frequency  proportional  t<>  this  probability.  This  is  irenerally 
proved  mathematically.  It  seems  to  me  to  IK-  true  a  priori.  ...  I  have  IK-CII 
unable  to  sever  the  judt'iiK-nt  that  <  no  event  is  more  likely  to  hapjH-n  than 
another  from  tin-  b«-lief  that  in  the  lor  ^  run  it  will  occur  more  frequently." 

l>eut  tirer  du  thcoreme  precedent  eette 
•omine  une  loi  penerale,  navoir,  que  les 
l  fort  J>eu  pres  c<>li8tans,  quand  ces  etlets 


isequence   qui   doit    Ctre    rfjarde 


rapjx>rtH  des  efTetH  de  la  nature,  sont 
Rout  considered  en  grand  nombre." 

3  Introduction,  pp.  liii,  liv. 

4  Kven  by  Mr.  Bradley,  rrincijtlr*  of  Logic,  ]>.  I'll.      After  criticininK  Venn's 
view  he  adds  :    "  It  w  false  that  the  chance«  must  l>e  realised  in  a  Heries.      It  is. 
however,  true  that  they  most  probably  will  U«,  and  true  ai/am  that  thin  prob 
ability  is  increased,  tho  greater  the  length  we  ^ive  to  our  K«-rie«." 
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is  zero,  and  it  is  certain  a  priori  that  there  will  be  either  2m 
heads  or  none.  Clearly  Bernoulli's  Theorem  is  inapplicable  to 
such  a  case.  And  this  is  but  an  extreme  case  of  a  normal 
condition. 

For  the  first  stage  in  the  proof  of  the  theorem  assumes  that, 
if  p  is  the  probability  of  one  occurrence,  p'  is  the  probability  of  r 
occurrences  running.  Our  discussion  of  the  theorems  of  multi 
plication  will  have  shown  how  considerable  an  assumption  this 
involves.  It  assumes  that  a  knowledge  of  the  fact  that  the  event 
has  occurred  on  every  one  of  the  first  r  - 1  occasions  does  not  in 
any  degree  affect  the  probability  of  its  occurrence  on  the  rth. 
Thus  Bernoulli's  Theorem  is  only  valid  if  our  initial  data  are  of 
such  a  character  that  additional  knowledge,  as  to  the  proportion 
of  failures  and  successes  in  one  part  of  a  series  of  cases  is  alto 
gether  irrelevant  to  our  expectation  as  to  the  proportion  in  another 
part.  If,  for  example,  the  initial  probability  of  the  occurrence 
of  an  event  under  certain  circumstances  is  one  in  a  million,  we 
may  only  apply  Bernoulli's  Theorem  to  evaluate  our  expectation 
over  a  million  trials,  if  our  original  data  are  of  such  a  character 
that,  even  after  the  occurrence  of  the  event  in  every  one  of  the 
first  million  trials,  the  probability  in  the  light  of  this  additional 
knowledge  that  the  event  will  occur  on  the  next  occasion  is  still 
no  more  than  one  in  a  million. 

Such  a  condition  is  very  seldom  fulfilled.  If  our  initial  prob 
ability  is  partly  founded  upon  experience,  it  is  clear  that  it  is 
liable  to  modification  in  the  light  of  further  experience.  It  is, 
in  fact,  difficult  to  give  a  concrete  instance  of  a  case  in  which  the 
conditions  for  the  application  of  Bernoulli's  Theorem  are  com 
pletely  fulfilled.  At  the  best  we  are  dealing  in  practice  with  a 
good  approximation,  and  can  assert  that  no  realised  series  of 
moderate  length  can  much  affect  our  initial  probability.  If  we 

2     [y 
wish  to  employ  the  expression  e~'~dt  we  are  in  a  worse 

V  TTj  0 

position.  For  this  is  an  approximate  formula  which  requires  for 
its  validity  that  the  series  should  be  long  ;  whilst  it  is  precisely 
in  this  event,  as  we  have  seen  above,  that  the  use  of  Bernoulli's 
Theorem  is  more  than  usually  likely  to  be  illegitimate. 

6.  The  conditions,  which  have  been  described  above,  can  be 
expressed  precisely  as  follows  : 
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Let  ,.xli  represent  the  statement  that  the  event  has  occurred 
on  M  out  of  n  occasions  and  has  not  occurred  on  the  others  ;  and 
let  ^Jh  =p,  where  h  represents  our  a  priori  data,  so  that  p  is  the 
d prim  probability  of  the  event  in  question.  Bernoulli's  Theorem 
then  requires  a  series  of  conditions,  of  which  the  following  is 
typical  :  ...  .  ,x,(  .  /.,,jr,  .  h^^^/li,  i.e.  the  probability  of  the  event 
on  the  n-f  1th  occasion  must  be  unaffected  by  our  knowledge  of 
its  proportionate  frequency  on  the  first  n  occasions,  and  must  be 
exactly  equal  to  its  a  priori  probability  before  the  first  occasion. 

Let  us  select  one  of  these  conditions  for  closer  consideration. 
If  //,  represents  the  statement  that  tin-  event  has  occurred  on  each 
of  r  successive  occasions,  // //<  ^=  //,:'//  ,//  .  >/  l  ft  and  so  on,  so 

that   ?/,./h=  \\i/fy    Ji.      Hence  if    we  are   to   have  //  'h     p,  we 

must  have  ?/,///  ,/*-•-/>  for  all  values  of  .<?  from  1  to  r.  But  in 
many  particular  examples  // ///  }h  increases  with  .«?.  so  that 
y,jh  -p.  Bernoulli's  Theorem,  that  is  to  say,  tends,  if  it  is 
carelessly  applied,  to  exaggerate  the  rate  at  which  the  probability 
of  a  given  divergence  from  the  most  probable  decreases  as  the 
divergence  increases.  If  we  are  Driven  a  penny  of  which  we  have 
no  reason  to  doubt  the  regularity,  the  probability  of  heads  at 
the  first  toss  is  .\  ;  but  if  heads  fall  at  even-  one  of  the  first  000 
tosses,  it  becomes  reasonable  to  estimate  the  probability  of  heads 
at  the  thousandth  toss  at  much  more  than  \.  For  the  a  priori 
probability  of  its  being  a  conjurer's  pcnnv.  or  otherwise  biassed 
so  as  to  fall  heads  almost  invariably,  is  not  usually  so  infinitesim- 
allv  small  as  (£)""".  \Ve  can  only  apply  Bernoulli's  Theorem 
with  rigour  for  a  prediction  as  to  the  penny's  behaviour  over  a 
series  of  a  thousand  tosses,  if  we  have  a  priori  Mich  exhaustive 
knowledge  of  the  pennv's  constitution  and  of  the  other  con 
ditions  of  the  problem  that  000  heads  running  would  not  cause 
us  to  modify  in  any  respect,  our  prediction  <1  priori. 

7.  It  seldom  happens,  therefore,  that  we  can  apply  Bernoulli's 
Theorem  with  reference  to  a  long  series  of  natural  events.  For 
in  such  cases  we  seldom  possess  the  exhaustive  knowledge  which 
is  necessarv.  Even  where  the  series  is  short,  the  perfectly 
rigorous  application  of  the  Theorem  is  not  likelv  to  be  legiti 
mate,  and  some  degree  of  approximation  will  be  involved  in 
utilising  its  results. 

Not  so  infrequently,  however,  artificial  perils  can  be  devised 
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in 


...  which  the  assumptions  of  Bernoulli's  Theorem  are  relatively 
legitimate.1  Given,  that  is  to  say,  a  proposition  av  some  series 
a±a2  .  .  .  can  be  found,  which  satisfies  the  conditions  : 

(i.)  ajh  =  a2/h.  .  .=ar/h. 


Adherents  of  the  Frequency  Theory  of  Probability,  who  use  the 
principal  conclusion  of  Bernoulli's  Theorem  as  the  defining  pro 
perty  of  all  probabilities,  sometimes  seem  to  mean  no  more  than 
that,  relative  to  given  evidence,  every  proposition  belongs  to 
some  series,  to  the  members  of  which  Bernoulli's  Theorem  is 
rigorously  applicable.  But  the  natural  series,  the  series,  for 
example,  in  which  we  are  most  often  interested,  where  the  a's 
are  alike  in  being  accompanied  by  certain  specified  conditions  c, 
is  not,  as  a  rule,  rigorously  subject  to  the  Theorem.  Thus  '  the 
probability  of  a  in  certain  conditions  c  is  | '  is  not  in  general 
equivalent,  as  has  sometimes  been  supposed,  to  '  It  is  500  to  1 
that  in  90,000  occurrences  of  c,  a  will  not  occur  more  than  20,200 
times,  and  500  to  1  that  it  will  not  occur  less  than  19,800  times.' 

8.  Bernoulli's  Theorem  supplies  the  simplest  formula  by 
which  we  can  attempt  to  pass  from  the  d  priori  probabilities  of 
each  of  a  series  of  events  to  a  prediction  of  the  statistical  frequency 
of  their  occurrence  over  the  whole  series.  We  have  seen  that 
Bernoulli's  Theorem  involves  two  assumptions,  one  (in  the  form 
in  which  it  is  usually  enunciated)  tacit  and  the  other  explicit. 
It  is  assumed,  first,  that  a  knowledge  of  what  has  occurred  at 
some  of  the  trials  would  not  affect  the  probability  of  what  may 
occur  at  any  of  the  others  ;  and  it  is  assumed,  secondly,  that  these 
probabilities  are  all  equal  d  priori.  It  is  assumed,  that  is  to  say, 
that  the  probability  of  the  event's  occurrence  at  the  rth  trial  is 
equal  d  priori  to  its  probability  at  the  wth  trial,  and,  further,  that 
it  is  unaffected  by  a  knowledge  of  what  may  actually  have 
occurred  at  the  wth  trial. 

A  formula,  which  dispenses  with  the  explicit  assumption  of 
equal  d  priori  probabilities  at  every  trial,  was  proposed  by 
Poisson,2  and  is  usually  known  by  his  name.  It  does  not  dispense, 

1  In  the  discussion  in  Chapter  XYL,  p.  170,  of  the  probability  of  a  diverg 
ence  from  an  equality  of  heads  and  tails  in  coin-tossing,  an  example  has  been 
given  of  the  construction   of  an  artificial  series  in  which  the  application  of 
Bernoulli's  Theorem  is  more  legitimate  than  in  the  natural  series. 

2  Recherchr.3,  pp.  240  et  seq. 
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however,  with  the  other  inexplicit  assumption.  The  difference 
between  Poisson's  Theorem  and  Bernoulli's  is  best  shown  by 
reference  to  the  ideal  case  of  balls  drawn  from  an  urn.  The 
typical  example  for  the  valid  application  of  Bernoulli's  Theorem 
is  that  of  balls  drawn  from  a  single  urn,  containing  black  and 
white  balls  in  a  known  proportion,  and  replaced  after  each  draw 
ing,  or  of  balls  drawn  from  a  scries  of  urns,  each  containing  black 
and  white  balls  in  the,  same  known  proportion.  The  typical 
example  for  Poisson's  Theorem  is  that  of  balls  drawn  from  a  series 
of  urns,  each  containing  black  and  white  balls  in  different  known 
proportions. 

Poisson's  Theorem  may  be  enunciated  as  follows  :  1  Let  8 
trials  be  made,  and  at  the  Xth  trial  (\  1,  2  ...  s)  let  the  prob 
abilities  for  the  occurrence  and  non-occurrence  of  the  event  be 

7?A,  qA  respectively.     Then,  if          -=p,  the  probability  that  the 

number  of  occurrences  m  of  the  event  in  the  s  trials  will  lie 
between  the  limits  xp±l  is  given  by 

I- 

•*"  '    /•-•» 

I- 


where  I, 

\ 


<tx  + 
'--y>//.v 


7,  this   may  be  written 

in  a  form  corresponding  to  that  of  Bernoulli's  Theorem,2  namely  : 
The  probability  that  the  number  of  occurrences  of  the  event 
will  lie,  between  sp±yk^/8  is  given  bv 

''-Jv.. 

9.  This  is  a  highly  ingenious  theorem  and  extends  the  applica 
tion  of  Bernoulli's  results  to  some  important  types  of  cases.  It 
embraces,  for  example,  the  case  in  which  th<i  successive  terms  of 
a  series  an-  drawn  from  distinct  populations  known  to  be  char 
acterised  by  differing  statistical  frequencies  ;  no  further  com- 

1  For  tin-  j.roof  sco  Poissoii,  Recherche*,  loc.  <  it.,  or  C/.ubt?r,  \\'<ihr»chc\nlich- 
keihirechntiinj,  vol.  i.  j»p.  IS.'J-lf/J. 

*   For  the  luminous  form  of  Bernoulli's  Tl.eon-m  Her  p.  33U  (footnote). 
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plication  being  necessary  beyond  the  calculation  of  two  simple 
functions  of  these  frequencies  and  of  the  number  of  terms  in  the 
series.  But  it  is  important  not  to  exaggerate  the  degree  to  which 
Poisson's  method  has  extended  the  application  of  Bernoulli's 
results.  Poisson's  Theorem  leaves  untouched  all  those  cases  in 
which  the  probabilities  of  some  of  the  terms  in  the  series  of  events 
can  be  influenced  by  a  knowledge  of  how  some  of  the  other  terms 
in  the  series  have  turned  out. 

Amongst  these  cases  two  types  can  be  distinguished.  In  the 
first  type  such  knowledge  would  lead  us  to  discriminate  between 
the  conditions  to  which  the  different  instances  are  subject.  If, 
for  example,  balls  are  drawn  from  a  bag,  containing  black  and 
white  balls  in  known  proportions,  and  not  replaced,  the  know 
ledge  whether  or  not  the  first  ball  drawn  was  black  affects  the 
probability  of  the  second  ball's  being  black  because  it  tells  us 
how  the  conditions  in  which  the  second  ball  is  drawn  differ 
from  those  in  which  the  first  ball  was  drawn.  In  the  second  type 
such  knowledge  does  not  lead  us  to  discriminate  between  the 
conditions  to  which  the  different  instances  are  subject,  but  it  leads 
us  to  modify  our  opinion  as  to  the  nature  of  the  conditions  which 
apply  to  all  the  terms  alike.  If,  for  instance,  balls  are  drawn 
from  a  bag,  which  is  one,  but  it  is  not  certainly  known  which,  out 
of  a  number  of  bags  containing  black  and  white  balls  in  differing 
proportions,  the  knowledge  of  the  colour  of  the  first  ball  drawn 
affects  the  probabilities  at  the  second  drawing,  because  it  throws 
some  light  upon  the  question  as  to  which  bag  is  being  drawn  from. 

This  last  type  is  that  to  which  most  instances  conform  which 
are  drawn  from  the  real  world.  A  knowledge  of  the  character 
istics  of  some  members  of  a  population  may  give  us  a  clue  to  the 
general  character  of  the  population  in  question.  Yet  it  is  this 
type,  where  there  is  a  change  in  knowledge  but  no  change  in  the 
material  conditions  from  one  instance  to  the  next,  which  is  most 
frequently  overlooked.1  It  will  be  worth  while  to  say  something 
further  about  each  of  these  two  types.2 

1  Numerous   instances   could    be  quoted.     To  take  a  recent   English    ex 
ample,  reference  may  be  made  to  Yule,  Introduction  to  the  Theory  of  Statistics, 
p.  251.     Mr.  Yule  thinks  that  the  condition  of  independence  is  satisfied  if  "the 
result  of  any  one  throw  or  toss  does  not  aiYect,  and  is  unaffected  by,  the  results 
of  the  preceding  and  following  tosses,"  and  does  not  allow  for  the  cases  in  which 
knowledge  of  the  result  is  relevant  apart  from  any  change  in  the  physical  con 
ditions. 

2  The  types  which  I  distinguish  under  four  heads  (the  Bernoullian,  the 
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10.  For  problems  of  the  first  type,  where  there  is  physical 
or  material  dependence  between  the  successive  trials,  it  is  not 
possible,  I  think,  to  propose  any  general  solution  ;  since  the 
probabilities  of  the  successive  trials  may  be  modified  in  all  kinds 
of  different  ways.  But  for  particular  problems,  if  the  conditions 
are  precise  enough,  solutions  can  be  devised.  The  problem,  for 
instance,  of  an  urn,  containing  black  and  white  balls  in  known 
proportions,  from  which  balls  are  drawn  successively  and  not 
replaced,1  is  ingeniously  solved  by  Czuber 2  with  the  aid  of 
Stirling's  Theorem.  If  <T  is  the  number  of  balls  and  s  the  number 
of  drawings,  he  reaches  the  interesting  conclusion  (assuming  that 
a,  s  and  a-  -s  are  all  large)  that  the  probability  of  the  number  of 
black  balls  lying  within  given  limits  is  the  same  as  it  would  !„• 
if  the  balls  were  replaced  after  each  drawing  and  the  number 

V  -  S 

of  drawings  were       -  .v  instead  of  s. 

(T 

In  addition  to  the,  assumptions  already  stated,  Professor 
Czuber's  solution  applies  only  to  those  cases  where  the  limits,  for 
which  we  wish  to  determine  the  probability,  are  narrow  compared 
with  the  total  number  of  black  balls  p<r.  Professor  Pearson  3  has 
worked  out  the  same  problem  in  a  much  more  general  manner, 
so  as  to  deal  with  the  whole  range,  i.e.  the  frequency  or  prob 
ability  of  all  possible  ratios  of  black  balls,  even  where  8>p(T.  Tin- 
various  forms  of  curve,  which  result,  according  to  the  different 
relations  existing  between  ;,,  s,  and  a,  supply  examples  of  each 
of  the  different  types  of  frequency  curve  which  arise  out  of  a 

1'ois.sonian,  ainl  the  tu<>  ih-.scriln.-d  above)  Barhelier  (Calcu!  rfr.f  prMiilit™, 
p.  1  ">."»)  classifies  as  follows  : 

(i.)  When  the,  conditions  are  identical  throughout,  tin-  problem  has  uni- 
form  it i'  ; 

(ii.)  When  they  vary  from  stayo  to  sta^e,  hut  according  to  a  law  L'ivcn  from 
tin-  hei'irminj:  and  in  a  manner  which  does  p«,t  depend  upon  what  ha.s  happ-ned 
at  the  earlier  stages,  it  has  indi'pc.ntlnnrt'  ; 

(iii.)  When  they  vary  in  a  manner  which  depends  upon  what  has  hapjM-ncd 
at  the  earlier  stages,  it  has  r,,nncx\le. 

Bachelier  ^ives  solutions  for  each  tyjx- on  the  assumption  th.-t  t  h.-  nund-er  of 
trials  is  very  jrn-at,  ami  that  the  number  of  Kureesses  or  failures  can  I*-  n  j.uded 
as  a  continuous  variahl.-.  This  is  the  same  kind  of  assumption  as  that  made 
in  the  proof  of  Bern<.:ilir.s  Theorem  -ivcn  in  §  2,  and  is  OJH-II  to  the  same  ohjer. 
tion.H, — or  rather  the  value  of  the  results  is  limited  in  the  hamc  wa\. 

1  It   is  of  i,D  consequence  uhether  the  halls   are  drawn  sijeceaaively  »nd  not 
replaced,  or  are  drawn  simultaneously. 

2  Lor.  rit.  vol.  i.  pp.  K;;{,  101. 

"Skew  Variation  in  Homogeneous  Material,"  /Vn7.  Trarui.  (IS'.l'i).  j».  .'{b'O. 
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classification  according  to  (i.)  skewness  or  symmetry,  (ii.)  limita 
tion  of  range  in  one,  both  or  neither  direction  ;  and  he  designates, 
therefore,  the  curves  which  are  thus  obtained  as  generalised  prob 
ability  curves.  His  discussion  of  the  properties  of  these  curves  is 
interesting,  however,  to  the  student  of  descriptive  statistics 
rather  than  to  the  student  of  probability.  The  most  generalised 
and,  mathematically,  by  far  the  most  elegant  treatment  of  this 
problem,  with  which  I  am  acquainted,  is  due  to  Professor 
Tschuprow.1 

Poisson,  in  attempting  a  somewhat  similar  problem,2  arrives 
at  a  result,  which  seems  obviously  contrary  to  good  sense,  by  a 
curious,  but  characteristic,  misapprehension  of  the  meaning  of 
'  independence  '  in  probability.  His  problem  is  as  follows  : 
If  I  balls  be  taken  out  from  an  urn,  containing  c  black  and  white 
balls  in  known  proportions,  and  not  replaced,  and  if  a  further 
number  of  balls  /x  be  then  taken  out,  the  probability  that  a  given 

m 
proportion  -     -  of  these  u  balls  will  be  black  is  independent  of 

m  +  n 

the  number  and  tJie  colour  of  the  I  balls  originally  drawn  out.  For, 
he  argues,  if  I  +  ^  balls  are  drawn  out,  the  probability  of  a  com 
bination,  which  is  made  up  of  I  black  and  white  balls  in  given 
proportions  followed  by  p  balls,  of  which  m  are  white  and  n  black, 
must  be  the  same  as  that  of  a  similar  combination  in  which  the 
/JL  balls  precede  the  I  balls.  Hence  the  probability  of  m  white 
balls  in  /z  drawings,  given  that  the  I  balls  have  already  been 
drawn  out,  must  be  equal  to  the  probability  of  the  same  result, 
when  no  balls  have  been  previously  drawn  out.  The  reader  will 
perceive  that  Poisson,  thinking  only  of  physical  dependence,  has 
been  led  to  his  paradoxical  conclusion  by  a  failure  to  distinguish 
between  the  cases  where  the  proportion  of  black  and  white  balls 
amongst  the  I  balls  originally  drawn  is  known  and  where  it  is  not. 
The  fact  of  their  having  been  drawn  in  certain  proportions,  pro 
vided  that  only  the  total  number  drawn  is  known  and  the  pro 
portions  are  unknown,  does  not  influence  the  probability.  Poisson 
states  in  his  conclusion  that  the  probability  is  independent  of  the 
number  and  colour  of  the  I  balls  originally  drawn.  If  he  had 
added- — as  he  ought — '  provided  the  number  of  each  colour  is 

1  "  Zur  Theorie  der  Stabilitat  statistischer  Reihon,"  p.  210,  published  in 
the  Skandinavisk  Aktuarietidskrift  for  1919. 

2  Loc.  cit.  pp.  231,  232. 
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unknown'  the  air  of  paradox  disappears.  This  is  an  exceedingly 
good  example  of  the  failure  to  perceive  that  a  probability  cannot 
be  influenced  by  the  occurrence  of  a  material  event  but  only  by 
such  knowledge,  as  we  may  have,  respecting  the  occurrence  of  the 
event.1 

11.  For  problems  of  the  second  type,  where  knowledge  of  the 
result  of  one  trial  is  capable  of  influencing  the  probability  at  the 
next  apart  from  any  change  in  the  material  conditions,  there  is, 
likewise,  no  general  solution.  The  following  artificial  example, 
however,  will  illustrate  the  sort  of  considerations  which  are  in 
volved. 

In  the  cases  where  Bernoulli's  Theorem  is  applied  to  practical 
questions,  the  d  priori  probability  is  generally  obtained  empiric 
ally  by  reference  to  the  statistical  frequency  of  each  alternative 
in  past  experience  under  apparently  similar  conditions.  Thus 
the  d  priori  probability  of  a  male  birth  is  estimated  by  reference 
to  the  recorded  proportion  of  male  births  in  the  past.2  The 
validity  of  estimating  probabilities  in  this  manner  will  be  dis 
cussed  later.  But  for  the  purposes  of  this  example  let  us  assume 
that  the  d  priori  probability  has  been  calculated  on  this  basis. 

Thus  the  d  priori  probability  p  I   •  -  )  of  an  event  is  based  on 

the  observation  of  its  occurrence  r  times  out  of  .v  occasions  on 
which  the  given  conditions  were  present.  Now,  according  to 
Bernoulli's  Theorem  directly  applied,  the  probability  of  the 

event's  occurring  n  times  running  is  p"  or  (     )  .      But,  if   the 

s 

event  occurs   at   the    first   trial,  the  probability  at  the  second 

1  For  JIM  attempt  to  solve;  other  problems  uf  this  tyj>e  see  Bachclicr,  Calctil 
dcx  proiahilitt'x,  chap.  ix.  (ProbabilitS*  run>ic.rtj).  I  think,  however,  Unit  tho 
solutions  of  this  chapter  are  vitiated  l>y  his  assuming  in  the  course  of  them 
both  that  certain  quantities  are  very  larye,  and  also,  at  a  later  sta^e,  that  the 
same  quantities  are  infinitesimal.  On  this  account,  for  example,  his  solution 
of  the  following  difficult  problem  breaks  down  :  (Jivcn  an  urn  A  with  m  white 
and  ?i  black  balls  and  an  urn  B  with  in'  white  and  n'  black  balls,  if  at  each  move 
a  ball  is  taken  from  A  and  put  into  B,  and  at  the  same  time  a  ball  is  taken  from 
15  and  put  int'>  A,  what  is  the  probability  aft<T  z  moves  that  the  urns  A  and  II 
shall  have  a  L'iven  composition  ? 

1  ( 'f.  Yuli-,  Theory  of  Statistic*,  p.  1».">S  :  '"  \\  V  are  not  able  t<>  assign  an 
d  priori  value  to  the  chance-  p  (i.e.  of  a  male  birth)  a*  in  tho  case  of  dice-throwing, 
but  it  is  quite  sufficiently  accurate  for  practical  purjM>ses  to  use  the  proportion 
of  male  births  actually  observed  if  that  projxjrtion  bo  board  on  a  moderately 
lar^e  number  of  observations." 
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r  +  1 

becomes          ,  and  so  on.      Hence  the  probability  P,  properly 
s  + 1 

calculated,  of  n  successive  occurrences  is 

r    r  +  1    r  +  2        r+  n -  I 


s    s  +  1    s  +  2        s  +n-l 

Hence 

p  =  (r  +  n-l)l  (s-1)! 
(s+w-1)  !  (?•-!)  ! 


Theorem,  provided  that  r  and  s  are  large  ; 


=7<"Q",  whore  Q 

(l  + 

Thus,  in  this  case,  the  assumption  of  Bernoulli's  Theorem  is 
approximately  correct,  only  if  Q  is  nearly  unity.  This  condition 
is  not  satisfied  unless  n  is  small  both  compared  with  r  and  com 
pared  with  s.  It  is  very  important  to  notice  that  two  conditions 
are  involved.  Not  only  must  the  experience,  upon  which  the 
a  priori  probability  is  based,  be  extensive  in  comparison  with  the 
number  of  instances  to  which  we  apply  our  prediction  ;  but  also 
the  number  of  previous  instances  multiplied  by  the  probability 
based  upon  them,  i.e.  sp  (  =  r),  must  be  large  in  comparison  with 
the  number  of  new  instances.  Thus,  even  where  the  prior  ex 
perience,  upon  which  we  found  the  initial  probability  P,  is  very 
extensive,  we  must  not,  if  P  is  very  small,  say  that  the  probability 
of  n  successive  occurrences  is  approximately  p",  unless  n  is  also 
small.  Similarly  if  we  wish  to  determine,  by  the  methods  of 
Bernoulli,  the  probability  of  n  occurrences  and  m  failures  on 
m  +  n  occasions,  it  is  necessary  that  we  should  have  m  and  n  small 
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compared  with  s,  n  small  compared  with  r,  and  m  small  compared 
with  s-r.1 

The  case  solved  above  L>  the  simplest  possible.     The  general 
problem  is  as  follows  :    If  an  event  has  occurred  x  times  in  the 

first  y  trials,  its  probability  at  the  y  +  1th  is  *    X\  determine  the 


a  priori  probability  of  the  event's  occurring  p  times  in  q  trials. 
If  the  d  priori  probability  in  question  is  represented  by  <f>(p,  y),  we 


I  know  of  no  solution  of  this,  even  approximate.  But  we  may 
say  that  the  conditions  are  those  of  supernormal  dispersion  as 
compared  with  Bernoulli's  conditions.  That  is  to  say,  the  prob 
ability  of  a  proportion  differing  widely  from  T  is  greater  than 

s 

in  Bernoullian  conditions  ;  for  when  the  proportion  begins  to 
diverge  it  becomes  more  probable  that  it  will  continue  to  diverge 
in  the  same  direction.  If,  on  the  other  hand,  the  conditions  of 
the  problem  had  been  such,  that  when  the  proportion  begins  to 
diverge  it  becomes  more  probable  that  it  will  recover  itself  and 

tend  back  towards      (as  when  we  draw  balls  without  replacing 

them  from  a  bag  of  known  composition),  we  should  have  sub 
normal  dispersion.2 

12.  The  condition  elucidated  in  the  preceding  paragraph  is 
frequently  overlooked  by  statisticians.  The  following  example 
from  Czuber  3  will  be  sufficient  for  the  purpose  of  illustration. 
Czuber's  argument  is  as  follows  : 

In  the  period  1800-1877  there  were  registered  in  Austria 
m-  1,31 1,070  male  births 
?i  =  4,052,193  female  births 

s  -8,303,209  ; 

1  This  paraL'Mph  is  concerned  with  a  different  jM.int  from  that  de.-tlt  with 
in  Professor  Pearson's  article  "On  the  Influence  of  Past  Kxj>erience  on  Future 
Expectation,"  to  which  it  boars  a  superficial  resemblance.  Professor  Pearson's 
article  which  deals,  not  with  Bernoulli's  Theorem,  but  with  Laplace's  "  Pule  of 
Succession,"  will  be  referred  to  in  §  16  of  this  chapter  and  in  §  12  of  the  next. 

1  Uuehelier  (f'rilrul  df*  probahilit&t,  p.  2OI)  classifies  thotw  two  kinds  of  con- 
ditions  as  rimditmn*  (iccilrratricea  and  conditimm  rrtardntr\ct.s. 

3  7,or.  cit.  vol.  ii.  p.  \~>.  I  choose  my  example  from  Professor  C/.ul«-r  In-cause 
he  is  usually  so  careful  an  cxjxjnent  of  theoretical  statistics. 
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for  the  succeeding  period,  1877-1899,  we  are  given  only 
m'  =  6,533,961  male  births  ; 

what  conclusion  can  we  draw  as  to  the  number  n  of  female 
births  ?  We  can  conclude,  according  to  Czuber,  that  the  most 
probable  value 

»0'  =  W  =  6,141,587, 
m 

and  that  there  is  a  probability  P  = -9999779  that  nf  will  lie 
between  the  limits  6,118,361  and  6,164,813. 

It  seems  in  plain  opposition  to  good  sense  that  on 
such  evidence  we  should  be  able  with  practical  certainty 

=  -9999779  -  1  -         --  )  to  estimate  the   number   of   female 

45250/ 

births  within  such  narrow  limits.  And  we  see  that  the  con 
ditions  laid  down  in  §  11  have  been  flagrantly  neglected.  The 
number  of  cases,  over  which  the  prediction  based  on  Bernoulli's 
Theorem  is  to  extend,  actually  exceeds  the  number  of  cases  upon 
which  the  a  priori  probability  has  been  based.  It  may  be  added 
that  for  the  period,  1877-1894,  the  actual  value  of  n  did  lie 
between  the  estimated  limits,  but  that  for  the  period,  1895- 
1905,  it  lay  outside  limits  to  which  the  same  method  had 
attributed  practical  certainty. 

That  Professor  Czuber  should  have  thought  his  own  argument 
plausible,  is  to  be  explained,  I  think,  by  his  tacitly  taking  account 
in  his  own  mind  of  evidence  not  stated  in  the  problem.  He  was 
relying  upon  the  fact  that  there  is  a  great  mass  of  evidence  for 
believing  that  the  ratio  of  male  to  female  births  is  peculiarly 
stable.  But  he  has  not  brought  this  into  the  argument,  and  he 
has  not  used  as  his  a  priori  probability  and  as  his  coefficient  of 
dispersion  the  values  which  the  wbole  mass  of  this  evidence  would 
have  led  him  to  adopt.  Would  not  the  argument  have  seemed 
very  preposterous  if  m  had  been  the  number  of  males  called 
George,  and  n  the  number  of  females  called  Mary  ?  Would  it  not 
have  seemed  rather  preposterous  if  m  had  been  the  number  of 
legitimate  births  and  n  the  number  of  illegitimate  births  ?  Clearly 
we  must  take  account  of  other  considerations  than  the  mere 
numerical  values  of  m  and  n  in  estimating  GUI  a  priori  probability. 
But  this  question  belongs  to  the  subject-matter  of  later  chapters, 
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and,  quite  apart  from  the  manner  of  calculation  of  the  a  priori 
probability,  the  argument  is  invalidated  by  the  fact  than  an 
d  priori  probability  founded  on  8.363,269  instances,  without 
corroborative  evidence  of  a  non-statistical  character,  cannot 
be  assumed  stable  through  a  calculation  which  extends  over 
12,700,000  instances. 

13.  Before  we  leave  the  theorems  of  Bernoulli  and  Poisson, 
it  is  necessary  to  call  attention  to  a  very  remarkable  theorem  by 
Tchebycheff,  from  which  both  of  the  above  theorems  can  be 
derived  as  special  cases.  This  result  is  reached  rigorously  and 
without  approximation,  by  means  of  simple  algebra  and  without 
the  aid  of  the  differential  calculus.  Apart  from  the  beauty 
and  simplicity  of  the  proof,  the  theorem  is  so  valuable  and  so 
little  known  that  it  will  be  worth  while  to  quote  it  in  full  :  ] 

Let    x,  ?/,  z .  .  .    represent    certain    magnitudes,    of    which    x 
can   take    the   values  x^x* .  .  .  x,.    with    probabilities    />,/)., .  .  .  p 
respectively,  y  the  values  ?/lt?/2 .  .  .  //-with  probabilities  q^  .  .  .y  , 

z  the  values  z^.-, .  .  .  zh   with  probabilities  r  r r..,  and 

so  that 


=  l,      r  =  l,  etc. 

i 


Write  2pKxK^a,  S/A?/A  =  //,  2rs~c,  etc., 

i  i  i 

/.  /  M 

and  ?./>  x _2  =  (t,,  S/  •//  '-  =  I,. , 


so  that  we  can  describe  a  as  the  mathematical  expectation  or 
average  value  of  x,  and  al  as  the  mathematical  expectation  or 
average  value  of  x2,  etc. 

The  probability  that  the  sum  x  +  y  +  z  \  ...  will  have  for 
its  value  z, +yA+«M  +  .  .  .  is  7V/.vrM  •  •  •  (provide,!  thai  the 
values  of  x,  ?/,  z  .  .  .  are  independent).  Hence 

1  J'Yoni  Jonrn.  Lioiirilli'  (2),  xii.,  1st  IT,  "  !'«'s  v;il«'iirs  ni"\  CMIICS,"  ;ui  article 
trun.slat<-<l  from  the  Kuxsiun  of  Tchrl.xclicfl.  '1  lii.s  jin.i.f  is  also  .pintr.!  l,y 
CV.ulMT,  lor.  tit.  p.  I'll',  through  whom  I  lirst  Ix-rainc  acijtiaintcd  with  it.  Mu.st 
of  TchehychefTs  work  was  published  previous  to  lS7d  and  apjM-arcd  (iriyin;illy 
in  Ruh-sian.  Jt  WUH  not  easily  accessible,  therefore,  until  the  jmMicutinn  at 
Petrograd  in  1007  of  the  collected  edition  of  his  works  in  l-'rench.  HiH 
theorems  are,  consequently,  not  nearly  HO  well  known  as  they  deserve  t<>  IK-, 
although  hi.s  nicest  important  theorems  were  reproduced  from  tune  t«.  time  in 
the  journals  of  Kuler  and  Liouville.  l-'or  full  references  >ec  the  I'.ililio^niphy. 

'2  A 


354  A  TREATISE  ON  PROBABILITY  FT.  v 

%'K  +  VI  +  ZP+  ...  -a-b-c-  ..  .)  V^A'V  •  •  • 
summed  for  all  values  of  K,  X,  ^  is  the  average  expectation  for 

(^+2/A  +  2u  +  -  -  •  -a-b-c-  .  .  .)2. 
Now          i.x*  -  2axK  +  a?)pK  =  2pKx* 


Also  ^-.q^r   .  .  .  =  1,  summed  for  all  values  of  X,  /z  .  .  .,  and 

L-  1; 

22(xK  -  «)(yA  -  b)pK  =  S2(aj^x  -  bosK  -  ay^  +  db)pK 


-ab-  ayK  +  ab)  =  0. 
Therefore  %(VGK  +  yK  +  z^t  +  .  .  .  -  a  -  Z>  -  c  -  .  .  . 


,  Kx  ......      K^     ... 

whence  -'  —  -0  —         2  =     » 

a  (ai  +  ^i  +  Ci  4-  ...  -  a2  -  ?>2  -  c2  -  .  .  .)          <72 

where  the  summation  extends  over  all  values  of  K,  A,  ^  .  .  .  and 
a  isxsome  arbitrary  number  greater  than  unity. 

If  we  omit  those  terms  of  the  sum  on  the  left-hand  side  of 
the  above  equation  for  which 

----2 

< 


and  write  unity  for  this  expression  in  the  remaining  terms,  both 
these  processes  diminish  the  magnitude  of  the  left-hand  side. 

Hence   ^pKq^r^.  .  .<   2,  where  the  summation   covers  those  sets 
of  values  only  for  which 


a'2K  +  /)L  -f  Cj  +  .  .  .-a2  -  I2  -c2  ..  .} 
If  J5  is  the  probability  that 

fe  +//A  +  •>  +  •••  -«  -?>-C...)2 

«2(tf.j  +  />!  +«!+...-  n2  -  //2  -  C2  -  .  .  .) 

is  equal  to  or  less  than  unity,  it  follows  that 
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i.e. 


Hence  the  probability  that  the  sum 

J\  +//A  +>  +  •••     lios  between  the  limits 

A  -     -  -     2  -    2  - 


and      nr  +  ft  +  ,-  +  ...+  „  XA/1  +  /(j  +  rj  +  >  _  _  ,,2  _  /,2  _  ,.2  _ 

is   Beater   than    1        .M   where   «    is  some  number  greater  than 

a" 

unity. 

This  result  constitutes  TchebycheiFs  Theorem.  Jt  may  also 
be  written  in  the  following  form  : 

Let   n    be   the    number  of    the    magnitudes    x,  //,  z  .  .  .,  and 

write     a=N      ;      then    the    probability    that    the    arithmetic 

mean  '  lies  between  the  limits 

// 

"  -f  !>  +  <•  +  .  .  .     1      A/  1  -i  l'i  •  ''!  •  .  .  . 
//  /  \'  // 

is  greater  than    1 

It  is  also  easv  to  show  '  as  a  deduction  from  Tchebycheil's 
Theorem  that,  if  an  amount  A  is  won  when  an  event  of  probability 
p\l>  1  (/\  occurs  and  an  amount  B  lost  when  it  fails,  then  in 
s  trials  the  probability  that  the  total  winnings  (or  losses)  will  lie 
between  the  limits 


is  greater  1  iian   1        „• 
a 

14.  Kroni  this  very  general  result  for  the  probable  limit.s  of 
a  sum  composed  of  a  number  of  independently  vaivinu;  magni 
tudes  Bernoulli's  Theorem  is  easil  derived.  For  let  there  be 


356 


A  TREATISE  ON  PROBABILITY 


s  observations  or  trials,  and  s  magnitudes  x-^x2 .  .  .  x.  corre 
sponding,  such  that  x  =  l  when  the  event  under  consideration 
occurs,  and  aj  =  0  when  it  fails.  If  the  probability  of  the  event's 
occurrence  is  p,  we  have  a=p,  b=p,  etc.,  and  a-^^p,  b1=p,  etc. 
Hence  the  probability  P  that  the  number  of  the  event's  occur 
rences  will  lie  between  the  limits  sp±a^sp-sp2,  i.e.  between 

the    limits    sp±a  •Jspq   where   q  =  1  -  p,  is   >  1  -  -g.       If    we 

a 

compare  this  formula  with  the  formula  for  Bernoulli's 
Theorem  already  given,  we  find  that,  where  this  formula 

gives   P>1 — -,    Bernoulli's    Theorem    with    greater    precision 


gives      = 


The  degree  of  superiority  in  the  matter 


of  precision  supplied  by  the  latter  can  be  illustrated  by  the 
following  table  : 


1-5 

2 

4-5 

8 

12-5 
18 


0  a  • 

i  -  \- 

V  - 

a- 

•7788 

•333 

•8427            -5 

•9661 

•7778 

•9953            -875 

•9996 

•92 

•99998 

•9445 

Thus  when  the  limits  arc  narrow  and  a  is  small,  Bernoulli's 
formula  gives  a  value  of  P  very  much  in  excess  of  1  -  g.  But 

CL 

Bernoulli's  formula  involves  a  process  of  approximation  which  is 
only  valid  when  s  is  large.  TchebychefFs  formula  involves  no 
such  process  and  is  equally  valid  for  all  values  of  s.  We  have 
seen  in  §  11  that  there  are  numerous  cases  in  which  for  a 
different  reason  Bernoulli's  formula  exaggerates  the  results, 
and,  therefore,  TchebychefFs  more  cautious  limits  may  some 
times  prove  useful. 

The  deduction  of  a  corresponding  form  of  Poisson's  Theorem 
from  TchebychefFs  general  formula  obviously  follows  on  similar 
lines.  For  we  put1  a=pl,  b=pz,  etc.,  and  OI=PI,  ^i=^2»  etc-' 

1  1  ana  using  the  same  notation  as  that  used  for  Poisson's  Theorem  in  §  8. 
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and  find  that  the  probability  that  the  number  of  the  event's 
occurrences  will  lie  between  the  limits 


i.e.  between  the  limits     *j>±a  \    -/'.//A, 
i.e.  between  the  limits     ••</'  ±  v/2a//  v/s, 

is  greater  than  /  - 

a2 

In  C  relic's  Journal1  Tchebvcheff  proves  Poisson's  Theorem 
directly  by  a  method  similar  to  his  general  method,  and  also 
obtains  several  supplementary  results  such  as  the  following  : 

I.  If  the  chances  of  an  event  E  in  /z  consecutive  trials  are 
PiP2  •  •  -  Pn  respectively,  and  their  sum  is  .v,  tin-  probability  that 
E  will  occur  at  least  m  times  is  less  than 


•  )  /  \  V 

-(//'    -ti)\  /j,  /i         /i      in 

provided  that  in:-.  si  1  : 

II.  and  the  probability  that  E  will  not  occur  more  than  n  times 

is  less  than 

1  //*(/*     ").  ^--s  \"    "/•"'')" 

2(.s-7t)V       //      [  n    ,, 

provided  that  n-   s  -  1. 

III.  Ilenco  the  probability  that  KJ  will  occur  less  than  in  times 
and  more  than  n  is     reater  than 


fi)\ 

provided  tn    -.s1  i  I  .  n  •    .v      I  . 

15.  TchebychefT's  methods  have,  been  set  out  and  his  results 
admirably    extended    by    A.    A.    MarkolY.'2     And    some   develop- 

1  Vol.   .'{.'{  (ISJti),   I),'iut>n.<trutifHi  I'li'nn  ntmn    d'uur  j>r,,  position  (jtntrnh'  <\>-  In 
l/ieorif.  den  probability*. 

2  Tho  render  in  n-fcrml  to  .MfirkofTs  Wahr*rhrinlichkfit#rrrhnunii,  .'»n<l  par 
ticularly   to   |/.   'IT,   for  ;i  .striking  tlcvrlopnifiit,  alon^   muthcmulu.il   lines,  of 
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ments  along  the  same  lines  by  Tschuprow  ("  Zur  Theorie  der 
Stabilitat  statistischer  Reihen,"  Skandinavisk  Aktuarietidskrift, 
1919)  have  convinced  me  that  TchebychefPs  discovery  is  far 
more  than  a  technical  device  for  solving  a  special  problem,  and 
points  the  way  to  the  fundamental  method  for  attacking  these 
questions  on  the  mathematical  side.  The  Laplacian  mathe 
matics,  although  it  still  holds  the  field  in  most  text-books,  is 
really  obsolete,  and  ought  to  be  replaced  by  the  very  beautiful 
work  which  we  owe  to  these  three  Russians. 

16.  There  is  one  other  investigation  relating  to  Bernoulli's 
Theorem  which  deserves  remark.  I  have  already  pointed  out, 
in  §  2,  that  the  dispersion  about  the  most  probable  value,  even 
when  the  conditions  for  the  applicability  of  Bernoulli's  Theorem 
in  its  non-approximate  form  are  strictly  fulfilled,  is  unsym- 
metrical.  The  fact,  that  the  usual  approximation  for  the  prob 
ability  of  a  divergence  h  from  the  most  probable  number  of 
occurrences  (the  notation  is  that  of  §  2  above)  takes  the  form 

1  /6~ 

e~:*>»M,  which  is  the  same  for  +h  as  for   -h,  has  led 

^/27rmpq 

to  this  want  of  symmetry  being  very  generally  overlooked; 
and  it  is  not  uncommon  to  assume  that  the  probability  of  a 
given  divergence  less  than  pm  is  equal  to  that  of  the  same  diverg 
ence  in  excess  of  pm,  and,  in  general,  that  the  probability  of 
the  frequency's  exceeding  pm  in  a  set  of  m  trials  is  equal  to  that 
of  its  falling  short  of  pm. 

That  this  is  not  strictly  the  case  is  obvious.  If  a  die  is  cast 
60  times,  the  most  probable  number  of  appearances  of  the  ace 
is  10  ;  but  the  ace  is  more  likely  to  appear  9  times  than  11  times  ; 
and  much  more  likely  (about  5  times  as  likely)  not  to  appear  at 
all  than  to  appear  exactly  20  times.  That  this  must  be  so  will 
be  clear  to  the  reader  (without  his  requiring  to  trouble  himself 
with  the  algebra),  when  he  reflects  that  the  ace  cannot  appear 
less  often  than  not  at  all,  whereas  it  may  well  appear  more  than 
20  times,  so  that  the  smallness  of  the  possible  divergence  in 
defect  from  the  most  probable  value  10,  as  compared  with  the 
possible  divergence  in  excess,  must  be  made  up  for  by  the  greater 

TchebycheS's  leading  idea.     Further  references  to  later  memoirs,  which,  being 
in  the  Russian  language,  ara  inaccessible  to  me,  will  be  found  in  the  Bibb 
graphy. 
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frequency  of  any  given  defection  as  compared  with  the  corre 
sponding  excess.  Thus  the  actual  frequency  in  a  series  of  trials 
of  an  event,  of  which  the  probability  at  each  trial  is  less  than  J. 
is  likely  to  fall  short  of  its  most  probable  value  more  often  than 
it  exceeds  it.  What  is  in  fact  true  is  that  the  mathematical 
expectation  of  deficiency  is  equal  to  the  mathematical  expecta 
tion  of  excess,  i.e.  that  the  sum  of  the  possible  deficiencies  each 
multiplied  by  its  probability  is  equal  to  the  sum  of  the  possible 
excesses  each  multiplied  by  its  probability. 

The  actual  measurement  of  this  want  of  symmetry  and  the 
determination  of  the  conditions,  in  which  it  can  be  safely 
neglected,  involves  laborious  mathematics,  of  which  I  am  only 
acquainted  with  one  direct  investigation,  that  published  in  the 
ings  of  (he  London  MatJiematical  Society  by  Mr.  T.  C. 


For  the  details  of  the  proof  I  must  refer  the  reader  to  Mr. 
Simmons's  article.     His   principal  theorem  2   is   as  follows  :     If 

is  the  probability  of  the  event  at  each  trial  and  ?t(a  +  l)  the 
a  +  1 

number  of  trials,  n  and  a  being  integers,3  the  probability  that  the 
frequencv  of  occurrence  will  fall  short  of  n  is  always  greater  than 
the  probability  that  it  will  exceed  n  ;  the  difference  between  the 
two  probabilities  being  a  maximum  when  n  =  l,  constantly 

(/      1 
+  1 


diminishing  as /I  increases,  lying  alwavs  between  .  times  the 

:',  (/ 


!  1  a     1 


greatest     term     in     (  -f  and    '  times    the 

a  +  1     a  +  1  •  >  n  r-  1 

1  "A  New   Theorem    in    Prohabilit y."      Mr.    Simmons  claimed    novelty    for 
his  investif.'  ition,   and    so    far   as    I    know   this   claim    is   justified;    hut   recent 
investigations  obtaining  closer  approximation*  to  Bernoulli's  Theorem  l>y  means 
of  the  Method  of  Moments  are  essentially  directed  towards  the  same  problem. 

A  somewhat  analogous  point  ha.s,  however,  been  raised  by  Professor  Pearson 
in  his  article  (I'hil.  Ma'/.,  I'.toT)  on  "  The  Jnlluence  of  Past  K.\|M-rieii«-e  on  Future 
KxjHJctation."  He  brings  out  an  exactly  similar  want,  of  symmetry  in  the 
probabilities  of  the  various  possible  frequencies  about  the,  most  probable  fre 
quency,  when  the  calculation  is  based,  not  on  Bernoulli's  Theorem  as  in  Mr. 
Simmons's  in\esliKation,  but  on  Laplace's  rule  of  succession  (see  next  chapter). 
The  want  of  symmetry  has  also  Ixjen  pointed  out  by  Professor  \A-\IH  (Atihaiid- 
lutiyen,  p.  IliO). 

2  1  am  not  ^ivinjj  his  own  enunciation  of  it. 

3  Mr.  Simmons  does  not  seem  to  have  been  able  to  remove  this  restriction 
on  the  generality  of  his  theorem,  hut  there  does  not  seem  much  reason  to  doubt 
that  it  can  bu  removed. 
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PT.   V 


greatest    term    in 


a+1     a  +1 


,    and    being   approxi- 


mately  equal,  when  n  is  very  large,  to  -  — . 

3 


The  following  table  gives  the  value  of  the  excess  A  of  the 
probability  of  a  frequency  less  than  pm  over  the  probability  of 
a  frequency  greater  than  pm  for  various  values  of  p  the  prob 
ability  and  m  the  number  of  trials  \p=  ,  m  =  n(a  +  l)  ,  as 
calculated  by  Mr.  Simmons  : 


1 

3 

3       -037037 

1 

*> 

15 

•02243662 

o 

24 

•0182706 

I 
4 

4 

•054687 

1 

4 

20 

•03201413 

1 

10 

10 

•084777 

1 

10 

20 

•068673713 

1 

100 

100       -101813 

1 

100 

200 

•081324387 

1 

1000 

1000 

•103454 

Thus  unless  not  only  m  but  mp  also  is  large  the  want  of  symmetry 
is  likely  to  be  appreciable.  Thus  it  is  easily  found  that  in  100 

sets  of  4  trials  each,  where  p  =  -,  the  actual  frequency  is  likely  to 
exceed  the  most  probable  26  times  and  to  fall  short  of  it  31  times  ; 
and  in  100  sets  of  10  trials  each,  where  p  =  -  ,  to  exceed  26  times 

and  to  fall  short  34  times. 

Mr.  Simmons  was  first  directed  to  this  investigation  through 
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noticing  in  the  examination  of  sets  of  random  digits  that  "  each 
digit  presented  itself,  with  unexpected  frequency,  less  than  of 

the  number  of  times.  For  instance,  in  100  sets  of  150  digits  each, 
I  found  that  a  digit  presented  itself  in  a  set  more  frequently  under 
15  times  than  over  15  times;  similarly  in  the  case  of  80  sets  each 
of  250  digits,  and  also  in  other  aggregations."  Its  possible 
bearing  on  such  experiments  with  dice  and  roulette,  as  are 
described  at  the  end  of  this  chapter,  is  clear.  But  apart  from 
these  artificial  experiments,  it  is  sometimes  worth  the  statis 
tician's  while  to  bear  in  mind  this  appreciable  want  of  symmetry 
in  the  distribution  about  the  mode  or  most  probable  value  in 
many  even  of  those  cases  in  which  Bernoullian  conditions  are 
strictly  fulfilled. 

17.  i  will  conclude  this  chapter  by  an  account  of  some  of  the 
attempts  which  have  been  made  to  verify  d  posteriori  the  con 
clusions   of    Bernoulli's   Theorem.     These   attempts   are    nearly 
useless,  first,  because  we  can  seldom  be  certain  a  priori  that  the 
conditions  assumed  in   Bernoulli's   Theorem  are  fulfilled,  and, 
secondly,  because  the  theorem  predicts  not  what  will  happen 
but  only  what  is,  on  certain  evidence,  likely  to  happen.     Thus 
even  where  our  results  do  not  verify  Bernoulli's  Theorem,  the 
theorem  is  not  thereby  discredited.     The  results  have  bearing 
on  the  conditions  in  which  the  experiments  took  place,  rather 
than  upon  the  truth  of  the  theorem.     In  spite,  therefore,  of  the 
not  unimportant  place  which  these  attempts  have  in  the  history 
of  probability,  their  scientific  value  is  very  small.     I  record  them, 
because  they  have  a  good  deal  of   historical  and  psychological 
interest,  and  because  they  satisfy  a  certain  idle  curiosity  from 
which  few  students  of  probability  are  altogether  five.1 

18.  The  data  for  these  investigations  have  been  principally 
drawn  from  four  sources — coin-tossing,  the  throw  of  dice,  lotteries, 
and    roulette  ;    for   in   such   cases    as   these    the   conditions   for 
Bernoulli's  Theorem  seem  to  be  fulfilled  most  nearly.    The  earliest 
recorded  experiment  was  carried  out   l>v   liuli'on,"  who,  assisted 

1  Mr.  Vulo  (Introduction  to  tihitisticjj,  p.  2~>J)  recommend*  its  indulgence  : 
"  The  student  is  strongly  recommended  to  curry  out  H  few  M-rit-H  of  Hiich  ex- 
j)erinient.s  j>erHonally,  in  order  to  acquire  confidence  in  the  use  of  the  theory." 
Mr.  Yule  himself  has  indulged  moderately. 

1  A'.wa  (Caritlitiii'tiqiie  rnornli  (nee  Bibliography),  published  1777,  n:iid  t<> 
have  been  comp<>..rd  about  17<»(). 
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by  a  child  tossing  a  coin  into  the  air,  played  2048  partis  of  the 
Petersburg  game,  in  which  a  coin  is  thrown  successively  until 
the  parti  is  brought  to  an  end  by  the  appearance  of  heads.  The 
same  experiment  was  repeated  by  a  young  pupil  of  De  Morgan's 
'  for  his  own  satisfaction.' 1  In  Buffon's  trials  there  were  1992 
tails  to  2048  heads  ;  in  Mr.  H.'s  (De  Morgan's  pupil)  2044  tails  to 
2048  heads.  A  further  experiment,  due  to  Buffon's  example, 
was  carried  out  by  Quetelet  2  in  1837.  He  drew  4096  balls  from 
an  urn,  replacing  them  each  time,  and  recorded  the  result  at 
different  stages,  in  order  to  show  that  the  precision  of  the  result 
tended  to  increase  with  the  number  of  the  experiments.  He 
drew  altogether  2066  white  balls  and  2030  black  balls.  Following 
in  this  same  tradition  is  the  experiment  of  Jevons,3  who  made 
2048  throws  of  ten  coins  at  a  time,  recording  the  proportion  of 
heads  at  each  throw  and  the  proportion  of  heads  altogether.  In 
the  whole  number  of  20,480  single  throws,  he  obtained  heads 
10,353  times.  More  recently  Weldon4  threw  twelve  dice  4096 
times,  recording  the  proportion  of  dice  at  each  throw  which 
showed  a  number  greater  than  three. 

All  these  experiments,  however,  are  thrown  completely  into 
the  shade  by  the  enormously  extensive  investigations  of  the  SwTiss 
astronomer  Wolf,  the  earliest  of  which  were  published  in  1850 
and  the  latest  in  1893.5  In  his  first  set  of  experiments  Wolf 
completed  1000  sets  of  tosses  with  two  dice,  each  set  continuing 
until  every  one  of  the  21  possible  combinations  had  occurred  at 
least  once.  This  involved  altogether  97,899  tosses,  and  he  then 
completed  a  total  of  100,000.  These  data  enabled  him  to  work 
out  a  great  number  of  calculations,  of  which  Czuber  quotes  the 

following,  namely  a  proportion  of  -83533  of  unlike  pairs,  as  against 

5 
the  theoretical  value  -83333,  i.e.    .     In  his  second  set  of  experi- 

1  Formal  Logic,  p.  185,  published  1847.     De  Morgan  gives  Buffon's  results, 
as  well  as  his  pupil's,  in  full.     Buffon's  results  are  also  investigated  by  Poisson, 
Recherches,  pp.  132-135. 

2  Letters  on  the  Theory  of  Probabilities  (Eng.  trans.),  p.  37. 

3  Principles  of  Science  (2nd  ed.),  p.  20S. 

4  Quoted  by  Edgeworth,  "Law  of  Error"  (J-hicy.  Brit.  JOth  ed.),  and  by 
Yule,  Introduction  to  Statistics,  p.  254. 

5  See  Bibliography.     Of  the  earliest  of  these  investigations  I  have  no  first 
hand  knowledge  and  have  relied  upon  the  account  given  by  Czuber,  loc.  cit. 
vol.  i.  p.  149.     For  a  general  account  of  empirical  verifications  of  Bernoulli's 
Theorem  reference  may  be  made  to  C/uber,  Wahrschcinlichkeitsrechnung,  vol.  i. 
pp.  139-152,  and  Czuber,  Entwickluny  der  Wahr8cheinlichkeitetheorie,pp.  88-91. 
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nients  Wolf  used  two  dice,  one  white  and  one  red  (in  the  first  set 
the  dice  were  indistinguishable),  and  completed  20,000  tosses,  the 
details  of  each  result  being  recorded  in  the  Vierteljahrsschrift  der 
NaturforsclieruLen  Gesellscliaft  in  Zurich.  He  studied  particularly 
the  number  of  sequences  with  each  die,  and  the  relative  frequency 
of  each  of  the  36  possible  combinations  of  the  two  dice.  The 
sequences  were  somewhat  fewer  than  they  ought  to  have  been, 
and  the  relative  frequency  of  the  different  combinations  very 
different  indeed  from  what  theory  would  predict.1  The  ex 
planation  of  this  is  easily  found  ;  for  the  records  of  the  relative 
frequency  of  each  face  show  that  the  dice  must  have  been  very 
irregular,  the  six  face  of  the  white  die,  for  example,  falling  38 
per  cent  more  often  than  the  four  face  of  the  same  die.  This, 
then,  is  the  sole  conclusion  of  these  immensely  laborious  experi 
ments, — that  Wolf's  dice  were  very  ill  made.  Indeed  the  ex 
periments  could  have  had  no  bearing  except  upon  the  accuracy 
of  his  dice.  But  ten  years  later  Wolf  embarked  upon  one  more 
series  of  experiments,  using  four  distinguishable  dice, — white, 
yellow,  red,  and  blue, — and  tossing  this  set  of  four  10,000  times. 
Wolf  recorded  altogether,  therefore,  in  the  course  of  his  life 
280,000  results  of  tossing  individual  dice.  It  is  not  clear  that 
Wolf  had  any  well-defined  object  in  view  in  making  these 
records,  which  are  published  in  curious  conjunction  witLvarious 
astronomical  results,  and  they  afford  a  wonderful  example  of  the 
pure  love  of  experiment  and  observation.2 

19.  Another  series  of  calculations  have  been  based  upon  the 
ready-made  data  provided  by  the  published  results  of  lotteries 
and  roulette.3 

1  (,'zuber  quotes  the  principal  results  (loc.  clt.   vol.  i.   pp.   149-151).     The 
frequencies  of  only  -1,  instead  of  IS,  out  of  the  'M  combinations  lay  within  the 
probable  limits,  and  tiie  standard  deviation  was  70S  instead  of  23-2. 

2  The  latest  experiment  of  the  kind,  of  which  I  am  aware,  is  that  of  Otto 
leissner  ("  Wurfelversuehe,"  Zeibchrift  ftir  Math,  und  Phys.  vol.  (>2  (1913),  pp. 

149-1.%),  who  recorded  24  series  of  180  throws  each  with  four  distinguishable 
dice. 

3  For  th.-  publication  of  hlu;h  returns  there  has  always  been  a  Bullieient 
demand  on  the  part  of  gamblers.     An  Almanack  remain  sur  la  lotcrie  royalc  dc. 
France  was  published  at  Paris  in  1x30,  which  contained  all  the  drawings  of  the 
French  lottery  (two  or  three  a  month,  from  1758  to  1830.     Players  at  Monte 
Carlo  uro  provided  with  cards  and  pins  with  which  to  record  the  results  of 
successive  coups,  and  the  results  at  the  tables  are  regularly  published  in  Le. 
Monaco.     Gamblers  study  these  returns  on  account  of  the  belief,  which  they 
usually  hold,  that,  as  the  number  of  cases  is  increased  the  absolute  deviation  from 
the   most  prob.ibk-  proportion    becomes   less,  whereas  at  the  best  Bernoulli's 


364  A  TREATISE  ON  PROBABILITY  PT.  v 

Czuber1  has  made  calculations  based  on  the  lotteries 
of  Prague  (2854  drawings)  and  Briinn  (2703  drawings)  between 
the  years  1754  and  1886,  in  which  the  actual  results  agree 
very  well  with  theoretical  predictions.  Fechner  2  employed  the 
lists  of  the  ten  State  lotteries  of  Saxony  between  the  years  1843 
and  1852.  Of  a  rather  more  interesting  character  are  Professor 
Karl  Pearson's  investigations  3  into  the  results  of  Monte  Carlo 
Roulette  as  recorded  in  Le  Monaco  in  the  course  of  eight  weeks. 
Applying  Bernoulli's  Theorem,  on  the  hypothesis  of  the  equi- 
probability  of  all  the  compartments  throughout  the  investigation, 
he  found  that  the  actually  recorded  proportions  of  red  and  black 
were  not  unexpected,  but  that  alternations  and  long  runs  were 
so  much  in  excess  that,  on  the  assumption  of  the  exact  accuracy 
of  the  tables,  the  d  priori  odds  were  at  least  a  thousand  millions 
to  one  against  some  of  the  recorded  deviations.  Professor 
Pearson  concluded,  therefore,  that  Monte  Carlo  Roulette  is  not 
objectively  a  game  of  chance  in  the  sense  that  the  tables  on  which 
it  is  played  are  absolutely  devoid  of  bias.  Here  also,  as  in  the 
case  of  Wolf's  dice,  the  conclusion  is  solely  relevant,  not  to  the 
theory  or  philosophy  of  Chance,  but  to  the  material  shapes  of 
the  tools  of  the  experiment. 

Professor  Pearson's  investigations  into  Roulette,  which  dealt 
with  33,000  Monte  Carlo  coups,  have  been  overshadowed,  just 

Theorem  shows  that  the  proportionate  deviation  decreases  while  the  absolute 
deviation  increases.  Cf.  Houdin's  Les  Trickeries  des  Grecs  devoile.es  :  "  In  a 
game  of  chance,  the  oftener  the  same  combination  has  occurred  in  succession,  the 
nearer  we  are  to  the  certainty  that  it  will  not  recur  at  the  next  cast  or  turn-up. 
This  is  the  most  elementary  of  the  theories  on  probabilities  ;  it  is  termed  the 
maturity  of  the  chances."  Laplace  (Etsai  philosophigae,  p.  142)  quotes  an 
amusing  instance  of  the  same  belief  not  drawn  from  the  annals  of  gambling  : 
"  J'ai  vu  des  hommes  desirant  ardemmcnt  d' avoir  un  fils,  n'apprendre  qu'avec 
peine  les  naissances  des  gargons  dans  le  mois  oil  ils  allaient  devenir  percs. 
S'imaginant  que  le  rapport  do  ces  naissances  a  celles  des  fillcs  devait  etro  le 
meme  a  la  fin  de  chaque  mois,  ils  jugaient  que  les  gareons  deja  nes  rendaierit 
plus  probables  les  naissances  prochaines  des  filles." 

The  literature  of  gambling  is  very  extensive,  but,  so  far  as  1  am  acquainted 
with  it,  excessively  lacking  in  variety,  the  maturity  of  the  chances  and  the 
martingale  continually  recurring  in  one  form  or  another.  The  curious  reader 
will  find  tolerable  accounts  of  such  topics  in  Proctor's  Chance  and  Luck,  and 
Sir  Hiram  Maxim's  Monle  Carlo  Facts  and  Fallacies. 

1  Zum  Gesetz  der  grossen  Zahlen.     The  results  are  summarised  in  his  Wahr- 
scheinlichkeitsrechnung ,  vol.  i.  p.  130. 

2  Kollektivmasslehre,  p.  229.     These  results  also  are  summarised  by  Czuber, 
lor.  d/. 

3  The  Chances  of  Death,  vol.  i. 
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as  all  other  tosses  of  coins  and  dice  have  been  outdone  by  Wolf, 
by  Dr.  Karl  Marbe,1  who  has  examined  80,000  coups  from  Monte 
Carlo  and  elsewhere.  Dr.  Marbe  arrived  at  exactly  opposite 
conclusions  ;  for  he  claims  to  have  shown  that  long  runs,  so  far 
from  being  in  excess,  were  greatly  in  defect.  Dr.  Marbe  intro 
duces  this  experimental  result  in  support  of  his  thesis  that  the 
world  is  so  constituted  that  long  runs  do  not  as  a  matter  of  fact 
occur  in  it.2  Not  merely  are  long  runs  very  improbable.  They 
do  not,  according  to  him,  occur  at  all.  But  we  may  doubt 
whether  roulette  can  tell  us  very  much  either  of  the  laws  of  logic 
or  of  the  constitution  of  the  universe. 

Dr.  Marbe's  main  thesis  is  identical,  as  he  himself  recognises, 
with  one  of  the  heterodox  contentions  of  D'Alembert.3  But  this 
principle  of  variety,  precisely  opposite  to  the  usual  principle  of 
Induction,  can  have  no  claim  to  be  accepted  d  priori  and,  as  a 
general  principle,  there  is  no  adequate  evidence  to  support  it  from 
experience.  Its  origin  is  to  be  found,  perhaps,  in  the  fact  that 

1  Naturphilonophische  U  ntersuchunyen  zur  Wahrscheinlichkeitsthcorie. 

2  Dr.  Marbe's  monograph  has  given  rise  in  Germany  to  a  good  deal  of  dis 
cussion,  not  directed  towards  showing  what  a  preposterous  method  this  is  for 
demonstrating  a  natural  law,  but  because  the  experimental  result  itself  does  not 
really  follow  from  the  data  and  is  due  to  a  somewhat  subtle  error  in  Marbe's 
reasoning,  by  which  he  has  been  led  into  an  incorrect  calculation  of  the  probable 
proportions  d  priori  of  the  various  sequences.     The  problem  is  discussed  by 
Von  Bortkiewicz,  Bromse,  Bruns,  Grimsohl,  and  Griinbaum  (for  exact  references 
to  these  see  the  Bibliography),  and  by  Lexis  (Abhandlumjen,  pp.  222-226)  and 
Czuber  (Wahr^heiTilichlceilHrechmtng,  vol.  i.  pp.  144-140).     Largely  as  a  result 
of  this  controversy,  Von  Bortkiewicz  has  lately  devoted  a  complete  treatise 
(Die  Ilcra'.innen)  to  the  mathematics  of  'runs!'      Dr.  Marbe  haa  been  given 
far  more  attention  by  his  colleagues  in  Germany  than  he  conceivably  deserves. 

3  D'Alembert's  principal  contributions  to  Probability  are  most  accessible  in 
the   volumes  of   his  Opuscule.*  tnathennitigvea  (1761).     Works   on   Probability 
usually  contain  some  reference  to  D'Alombert,  but  his  sceptical  opinions,  re 
jected  rather  than  answered  by  the  orthodox  school  of  Laplace,  have  not  always 
received  full  justice.     D'Alembert  has  three  main  contentions  to  which  in  his 
various  papers  he  constantly  recurs  : 

(1)  That  a  probability  very  small  mathematically  is  really  zero  ; 

(2)  That  the   probabilities  of    two  successive  throws  with    u  die  are  not 
independent ; 

(.'{)  That  'mathematical  expiation  '  is  not  properly  measured  bv  the 
product  of  the  probability  and  the  prize. 

The  tir«t  and  third  of  these  were  partly  advanced  in  explanation  of  the 
Petersburg  paradox  (see  p.  316).  The  second  is  connected  with  the  first,  and 
was  also  used  to  support  his  incorrect  evaluation  of  the  probability  of  heads 
twice  running  ;  but  D'Alomlx-rt,  in  spito  of  many  of  his  results  being  wrong, 
does  n«,t  altogether  deserve  the  ridicule  which  he' has  suffered  at  the  hands  of 
writers,  who  accept™!  without  sceptical  doubts  the  hardly  less  incorrect  con 
clusions  of  the  orthodox  theory  of  that  time. 
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in  a  certain  class  of  cases,  especially  where  conscious  human 
agency  comes  in,  it  may  contain  some  element  of  truth.  The 
fact  of  an  act's  having  been  done  in  a  particular  way  once  may 
be  a  special  reason  for  thinking  that  it  will  not  be  performed  on 
the  next  occasion  in  precisely  the  same  manner.  Thus  in  many 
so-called  random  events  some  slight  degree  of  causal  and  material 
dependence  between  successive  occurrences  may,  nevertheless, 
exist.  In  these  cases  '  runs  '  may  be  fewer  and  shorter  than  those 
which  we  should  predict,  if  a  complete  absence  of  such  dependence 
is  assumed.  If,  for  example,  a  pack  of  cards  be  dealt,  collected, 
and  shuffled,  to  the  extent  that  card-players  do  as  a  rule  shuffle, 
there  may  be  a  greater  presumption  against  the  second  hand's 
being  identical  with  the  first  than  against  any  other  particular 
distribution.  In  the  case  of  croupiers  long  experience  might 
possibly  suggest  some  psychological  generalisation, — that  they 
are  very  mechanical,  giving  an  excess  of  numbers  belonging  to  a 
particular  section  of  the  wheel,  or,  on  the  other  hand,  that  when 
a  croupier  sees  a  run  beginning,  he  tends  to  vary  his  spin  more  than 
usual,  thus  bringing  runs  to  an  end  sooner  than  he  ought.1  At  any 
rate,  it  is  worth  emphasising  once  more  that  from  such  experi 
ments  as  these  this  is  the  only  kind  of  knowledge  which  we  can 
hope  to  obtain, — knowledge  of  the  material  construction  of  a 
die  or  of  the  psychology  of  a  croupier. 

1  A  good  roulette  table  is,  however,  so  delicate  an  instrument  that  no  prob 
able  degree  of  regularity  of  habit  on  the  part  of  the  spinner  coiild  be  sufficient 
to  produce  regularity  in  the  result. 


CHAPTER   XXX 

THE  MATHEMATICAL  USE  OF  STATISTICAL  FREQUENCIES  FOR 
THE  DETERMINATION  OF  PROBABILITY  A  POSTERIORI — THE 
METHODS  OF  LAPLACE 

Utilissima  est  aostiraatio  probabilitatum,  quanquam  in  exemplis  juridicis 
politicisquo  plorumque  nun  tarn  subtili  calculo  opus  est,  quam  accurata 
omnium  circumstantiarurn  enumeratione. — LKIBNIZ. 

1.  IN  the  preceding  chapter  we  have  assumed  that  the  probability 
of  an  event  at  each  of  a  series  of  trials  is  given,  and  have  considered 
how  to  infer  from  this  the  probabilities  of  the  various  possible 
frequencies  of  the  event  over  the  whole  series,  without  discussing 
in  detail  by  what  method  the  initial  probability  had  been  deter 
mined.  In  statistical  inquiries  it  is  generally  the  case  that  this 
initial  probability  is  based,  not  upon  the  Principle  of  In 
difference,  but  upon  the  statistical  frequencies  of  similar  events 
which  have  been  observed  previously.  In  this  chapter,  therefore, 

we  must  commence  the  complementary  part  of  our  inquiry, 

namely,  into  the  method  of  deriving  a  measure  of  probability 
from  an  observed  statistical  frequency. 

I  do  not  myself  believe  that  there  is  any  direct  and  simple 
method  by  which  we  can  make  the  transition  from  an  observed 
numerical  frequency  to  a  numerical  measure  of  probability. 
The  problem,  as  I  view  it,  is  part  of  the  general  problem  of  found 
ing  judgments  of  probability  upon  experience,  and  can  only  be 
dealt  with  by  the  general  methods  of  induction  expounded  in 
Part  III.  The  nature  of  the  problem  precludes  any  other  method, 
and  direct  mathematical  devices  can  all  be  shown  to  depend 
upon  insupportable  assumptions.  In  the  next  chapters  we  will 
consider  the  applicability  of  general  inductive  methods  to  this 
problem,  and  in  this  we  will  endeavour  to  discredit  the  mathe 
matical  charlatanry  by  which,  for  a  hundred  years  past,  the  basis 
of  theoretical  statistics  has  been  greatly  undermined. 
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2.  Two  direct  methods  have  been  commonly  employed, 
theoretically  inconsistent  with  one  another,  though  not  in  every 
case  noticeably  discrepant  in  practice.  The  first  and  simplest  of 
these  may  be  termed  the  Inversion  of  Bernoulli's  Theorem,  and 
the  other  Laplace's  Rule  of  Succession. 

The  earliest  discussion  of  this  problem  is  to  be  found  in  the 
Correspondence  of  Leibniz  and  Jac.  Bernoulli,1  and  its  true 
nature  cannot  be  better  indicated  than  by  some  account  of  the 
manner  in  which  it  presented  itself  to  these  very  illustrious 
philosophers.  The  problem  is  tentatively  proposed  by  Bernoulli 
in  a  letter  addressed  to  Leibniz  in  the  year  1703.  We  can  deter 
mine  from  d  priori  considerations,  he  points  out,  by  how  much  it 
is  more  probable  that  we  shall  throw  7  rather  than  8  with  two  dice, 
but  we  cannot  determine  by  such  means  the  probability  that  a 
young  man  of  twenty  will  outlive  an  old  man  of  sixty.  Yet  is  it 
not  possible  that  we  might  obtain  this  knowledge  d  posteriori 
from  the  observation  of  a  great  number  of  similar  couples,  each 
consisting  of  an  old  man  and  a  young  man  ?  Suppose  that  the 
young  man  was  the  survivor  in  1000  cases  and  the  old  man  in  500 
cases,  might  we  not  conclude  that  the  young  man  is  twice  as  likely 
as  the  old  man  to  be  the  survivor  ?  For  the  most  ignorant 
persons  seem  to  reason  in  this  way  by  a  sort  of  natural  instinct, 
and  feel  that  the  risk  of  error  is  diminished  as  the  number  of 
observations  is  increased.  Might  not  the  solution  tend  asymp 
totically  to  some  determinate  degree  of  probability  with  the 
increase  of  observations  ?  Nescio,  Vir  Amplissime,  an  specula- 
tionibus  istis  soliditatis  aliquid  inesse  Tibi  videatur. 

Leibniz's  reply  goes  to  the  root  of  the  difficulty.  The  calcula 
tion  of  probabilities  is  of  the  utmost  value,  he  says,  but  in  statisti 
cal  inquiries  there  is  need  not  so  much  of  mathematical  subtlety 
as  of  a  precise  statement  of  all  the  circumstances.  The  possible 
contingencies  are  too  numerous  to  be  covered  by  a  finite  number 
of  experiments,  and  exact  calculation  is,  therefore,  out  of  the 
question.  Although  nature  has  her  habits,  due  to  the  recurrence 
of  causes,  they  are  general,  not  invariable.  Yet  empirical  calcula 
tion,  although  it  is  inexact,  may  be  adequate  in  affairs  of  practice.2 

1  For  the  exact  references  see  Bibliography. 

2  Leibniz's  actual  expressions  (in  a  letter  to  Bernoulli,  December  3,  1 703)  are 
as  follows  :    Utilissima  est  aestimatio  probabilitatum,  quanquam  in  exemplis 
juridicis  politicisque  plerumque  non  tam  subtili  calculo  opus  est,  quaui  accurata 
omnium  circumstantiarum  enumeratione.     Cum  empirice  aestimamus  proba- 
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Bernoulli  in  his  answer  fell  back  upon  the  analogy  of  balls 
drawn  from  an  urn,  and  maintained  that  without  estimating 
each  separate  contingency  we  might  determine  within  narrow 
limits  the  proportion  favouring  each  alternative.  If  the  true 
proportion  were  2  :  1,  we  might  estimate  it  with  moral  certainty 
d  posteriori  as  lying  between  201  :  100  and  199  :  100.  "  Certus 
sum,"  he  concluded  the  controversy,  "  Tibiplacituramdemonstra- 
tionem,  cum  publicavero."  But  whether  he  was  impressed  by 
the  just  caution  of  Leibniz,  or  whether  death  intercepted  him, 
he  advances  matters  no  further  in  the  Ars  Conjedandi.  After 
dealing  with  some  of  Leibniz's  objections  1  and  seeming  to 
promise  some  mode  of  estimating  probabilities  a  posteriori  by  an 
inversion  of  his  theorem,  he  proves  the  direct  theorem  only  and 
the  book  is  suddenly  at  an  end. 

3.  In  dealing  with  the  correspondence  of  Leibniz  and   Ber 
noulli,  I  have  not  been  mainly  influenced  by  the  historical  interest 
of  it.     The  view  of  Leibniz,  dwelling  mainly  on  considerations 
of  analogy,  and  demanding  "  not  so  much  mathematical  subtlety 
as  a  precise  statement  of  all  the  circumstances,"  is,  substantially, 
the  view  which  will    be  supported   in  the    following  chapters! 
The  desire  of  Bernoulli  for  an  exact  formula,  which  would  derive 
from   the    numerical   frequency   of   the   experimental   results   a 
numerical    measure    of    their    probability,    preludes    the    exact 
formulas  of  later  and  less  cautious  mathematicians,  which  will  be 
examined  immediately. 

4.  During  the  greater  part  of  the  eighteenth  century  there  is 
no  trace,  I  think,  of  the  explicit  use  of  the  Inversion  of  Bernoulli's 
Theorem.     The  investigations  carried  out  by  D'Alembert,  Daniel 
Bernoulli,  and  others  relied  upon  the  type  of  argument  examined 
in  Chapter  XXV.     They  showed,  that   is   to  say,  that  certain 
observed  series  of  events  would  have  been  very  improbable,  if 
we  had  supported  independence  between  some  two  factors  or  if 


porfccto 


en    por   expmmenta  KueccHsmim.  qnaeris   an   ea   via  tandem   arstimatio 


vide!  ur, 
finita  experini< 
tudine.s,  natas 
inundant  sultii 


possit.  Id.jiie  a  Te  repertum  scribis.  Diilifultas  in  eo  mini 
nod  contingcntia  sou  quao  infinitis  ]>endent  circumstantiis,  per 
tu  dotermiiiari  nan  ponsuiit  ;  ii.-ituru  <|uid.-m  smis  habet  consue- 
x  reilitu  c.iusarum,  sed  rum  nisi  is  ivl  r6  iroXi'..  Novi  morbi 


niim 


jruinis.  .piodai  erp>  do  mnrtibus  qiiotcunquo  e\- 
permiuiita  freeris,  mm  ideo  naturae  reriim  limites  poHuisti,  ut  pro  futuro  variaro 
non  posnit.  Ktsi  iiiitfin  (!iii].iri«>  non  JKWHOI  Iiaberi  jterfeota  aestirnatio,  nun 
ideo  minus  empiric-a  aestimntio  in  praxi  utilis  (;t  HuHicieiirt  foret. 

1  The  relevant  piuuwigt-.s  are  on  pp.  l>21-liL'7  of  the  Ar«  Courrbinili. 
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some  occurrence  had  been  assumed  to  be  as  likely  as  not,  and  they 
inferred  from  this  that  there  was  in  fact  a  measure  of  dependence 
or  that  the  occurrence  had  probability  in  its  favour.  But  they 
did  not  endeavour  to  pass  from  the  observed  frequency  of  occur 
rence  to  an  exact  measure  of  the  probability.  With  the  advent 
of  Laplace  more  ambitious  methods  took  the  field. 

Laplace  began  by  assuming  without  proof  a  direct  inversion 
of  Bernoulli's  Theorem.  Bernoulli's  Theorem,  in  the  form  in 
which  Laplace  proved  it,  states  that,  if  p  is  the  probability  d 

/yy) 

priori,  there  is  a  probability  P  that  the  proportion  of  times 

m  +  n 

of  the  event's  occurrence  in  /z  (  =  m  +  n)  trials  will  lie  between 
p±y  —,  where  P  =  e~ftdt+  e~y\  The  in- 

V  n  ^ 


version  of  the  theorem,  which  he  assumes  without  proof, 
states  that,  if  the  event  is  observed  to  happen  m  times 
in  IJL  trials,  there  is  a  probability  P  that  the  probability 

,      .,  ...  m         /2mn 

oi      the      event     p     will     lie     between        ±7    /         ,     where 

fji        V     P 


_  2 
!~ 


_  2 
P  =  \  e  !~dt  +  e~y~.       The  same  result  is  also  given 


by  Poisson.1  Thus,  given  the  frequency  of  occurrence  in  /JL 
trials,  these  writers  infer  the  probability  of  occurrence  at 
subsequent  trials  within  certain  limits,  just  as,  given  the 
d  priori  probability,  Bernoulli's  Theorem  would  enable  them 
to  predict  the  frequency  of  occurrence  in  //,  trials  within  corre 
sponding  limits. 

1  For  an  account  of  the  treatments  of  this  topic  both  by  Laplace  and  by 
Poisson,  see  Todhunter's  History,  pp.  554-557.  Both  of  them  also  obtain  a 
formula  slightly  different  from  that  given  above  by  a  method  analogous  to  the 
first  part  of  the  proof  of  Laplace's  Rule  of  Succession  ;  i.e..  bv  an  application  of 
the  inverse  principle  of  probability  to  the  assumption  that  the  probability  of 
the  probability's  lying  within  any  interval  is  proportional  to  the  length  of  the 
interval.  This  discrepancy  has  given  rise  to  some  discussion.  See  Todhuntor, 
loc.  cil.  ;  Do  Morgan,  On  a  Question  in  ihe  Theory  of  Probabilities  ;  Monro,  On  the 
Inversion  of  Bernoulli's  Theorem  in  Probabilities  ;  and  Gzuber,  Entwicklurig, 
pp.  83,  84.  But  this  is  not  the  important  distinction  between  the  two  mathe 
matical  methods  by  which  this  question  has  been  approached,  and  this  minor 
point,  which  is  of  historical  interest  mainly,  I  forbear  to  enter  into. 
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If  the  number  of  trials  is  at  all  numerous,  these  limits  are 
narrow  and  the  purport  of  the  inversion  of  Bernoulli's  Theorem 
may  therefore  be  put  briefly  as  follows.  By  the  direct  theorem, 
if  p  measures  the  probability,  p  also  measures  the  most  probable 

value  of  the  frequency  ;   by  the  inversion  of  the  theorem,  if 

m  +  n 

Yfl 

measures  the  frequency,         -  also  measures  the  most  probable 
m  +  n 

value  of  the  probability.  The  simplicity  of  the  process  has  re 
commended  it,  since  the  time  of  Laplace,  to  a  great  number  of 
writers.  Czuber's  argument,  criticised  on  p.  351,  with  reference 
to  the  proportions  of  male  and  female  births  in  Austria,  is  based 
upon  an  unqualified  use  of  it.  But  examples  abound  throughout 
the  literature  of  the  subject,  in  which  the  theorem  is  employed  in 
circumstances  of  greater  or  less  validity. 

The  theorem  was  originally  given  without  proof,  and  is  indeed 
incapable  of  it,  unless  some  illegitimate  assumption  has  been 
introduced.  But,  apart  from  this,  there  are  some  obvious  objec 
tions.  We  have  seen  in  the  preceding  chapter  that  Bernoulli's 
Theorem  itself  cannot  be  applied  to  all  kinds  of  data  indiscrimin 
ately,  but  only  when  certain  rather  stringent  conditions  are  ful 
filled.  Corresponding  conditions  are  required  equally  for  the 
inversion  of  the  theorem,  and  it  cannot  possibly  be  inferred  from 
a  statement  of  the  number  of  trials  and  the  frequency  of  occur 
rence  merely,  that  these  have  been  satisfied.  We  must  know, 
for  instance,  that  the  examined  instances  are  similar  in  the  main 
relevant  particulars,  both  to  one  another  and  to  the  uncxamined 
instances  to  which  we  intend  our  conclusion  to  be  applicable. 
An  nnanalysed  statement  of  frequency  cannot  tell  us  this. 

This  method  of  passing  from  statistical  frequencies  to  prob 
abilities  Is  not,  however,  like  the  method  to  be  discussed  in  a 
moment,  radically  false.  With  due  qualifications  it  has  its  place 
in  the  solution  of  this  problem.  The  conditions  in  which  an 
inversion  of  Bernoulli's  Theorem  is  legitimate  will  be  elucidated 
in  Chapter  XXXI.  In  the  meantime  we  will  pass  on  to  Laplace's 
second  method,  which  is  more  powerful  than  the  first  and  has 
obtained  a  wider  currency.  The  more  extreme  applications  of 
it  are  no  longer  ventured  upon,  but,  tlu>  theory  which  underlies 
if  is  still  widely  adopted,  especially  by  French  writers  upon 
probability,  and  M-ldom  repudiated. 
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5.  The  formula  in  question,  which  Venn  l  has  called  the  Rule 
of  Succession,  declares  that,  if  we  know  no  more  than  that  an 
event  has  occurred  m  times  and  failed  n  times  under  given  con 
ditions,  then  the  probability  of  its  occurrence  when  those  con 
ditions  are  next  fulfilled  is  It  is  necessary,  however 

m  +  n  +  2 

before  we  examine  the  proof  of  this  formula,  to  discuss  in  detail 
the  reasoning  which  leads  up  to  it. 

This  preliminary  reasoning  involves  the  Laplacian  theory  of 
'  unknown  probabilities.'  The  postulate,  upon  which  it  depends, 
is  introduced  to  supplement  the  Principle  of  Indifference,  and 
is  in  fact  the  extension  of  this  principle  from  the  probabilities 
of  arguments,  when  we  know  nothing  about  the  arguments,  to  the 
probabilities  that  the  probabilities  of  arguments  have  certain 
values,  when  we  know  nothing  about  the  probabilities.  Laplace's 
enunciation  is  as  follows  :  "  Quand  la  probabilite  d'un  evenement 
simple  est  inconnue,  on  peut  lui  supposer  egalement  toutes  les 
valeurs  depuis  zero  jusqu'a  1'unite.  La  probabilite  de  chacune 
de  ces  hypotheses  tiree  de  Pevenement  observe  est  .  .  .  une 
fraction  dont  le  numerateur  est  la  probabilite  de  1' evenement  dans 
cette  hypothese,  et  dont  le  denominateur  est  la  somme  des  pro- 
babilites  semblables  relatives  a  toutes  les  hypotheses.  .  .  ."  2 

Thus  when  the  probability  of  an  event  is  unknown,  we  may 
suppose  all  possible  values  of  the  probability  between  0  and  1  to 
be  equally  likely  d  priori.  The  probability,  after  the  event  has 

occurred,  that  the  probability  d  priori  was  -  (say),  is  measured 

1  r 

by  a  fraction  of  which     is  the  numerator  and  the  sums  of  all  the 
r 

possible  d  'priori  values  the  denominator.  The  origin  of  this  rule 
is  evident.  If  we  consider  the  problem  in  which  a  ball  is  drawn 
from  a  bag  containing  an  infinite  number  of  black  and  white  balls 
in  unknown  proportions,  we  have  hypotheses,  corresponding  to 
each  of  the  possible  constitutions  of  the  bag,  the  assumption  of 
which  yields  in  turn  every  value  between  0  and  1  as  the  d  priori 
probability  of  drawing  a  white  ball.  If  we  could  assume  that 
these  constitutions  are  equally  probable  d  priori,  we  should 
obtain  probabilities  for  each  of  them  d  posteriori  according  to 
Laplace's  rule. 

1  Loyic  oj  Chance,  p.  190.  2  Kami  philosoph'HUic,  p.  JO. 
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On  the  analogy  of  this  Laplace  assumes  in  general  that,  where 
everything  is  unknown,  we  may  suppose  an  infinite  number  of 
possibilities,  eaeh  of  which  is  equally  likely,  and  each  of  which 
leads  to  the  event  in  question  with  a  different  decree  of  probability, 
so  that  for  every  value  between  0  and  1  there  is  one  and  only  one 
hypothetical  constitution  of  things,  the  assumption  of  which 
invests  the  event  with  a  probability  of  that  value. 

6.  It  might  be  an  almost  sufficient  criticism  of  the  above  to 
point  out  that  these  assumptions  are  entirely  baseless.     But  the 
theory  has  taken  so  important  a  place  in  the  development  of 
probability  that  it  deserves  a  detailed  treatment. 

What,  in  the  first  place,  does  Laplace  mean  by  an  unknown 
probability  ?  He  does  not  mean  a  probability,  whose  value  is  in 
fact  unknown  to  us,  because  we  are  unable  to  draw  conclusions 
which  could  be  drawn  from  the  data  ;  and  he  seems  to  apply  the 
term  to  any  probability  whose  value,  according  to  the  argument 
of  Chapter  III.,  is  numerically  indeterminate.  Thus  he  assumes 
that  ever  a  probability  has  a  numerical  value  and  that,  in  those 
cases  where  there  seems  to  be  no  numerical  value,  this  value  is 
not  non-existent  but  unknown  ;  and  he  proceeds  to  argue  that 
where  the  numerical  value  is  unknown,  or  as  I  should  say  where 
there  is  no  such  value,  every  value  between  0  and  1  is  equally 
probable.  With  the  possible  interpretations  of  the  term  '  un 
known  probability,'  and  with  the  theory  that  every  probability 
can  be  measured  by  one  of  the  real  numbers  between  0  and  1, 
I  have  dealt,  as  carefully  as  I  can,  in  Chapter  III.  If  the  view 
taken  there  is  correct,  Laplace's  theory  breaks  down  immediately. 
But  even  if  we  were  to  answer  these  questions,  not  as  they  have 
been  answered  in  Chapter  III.,  but  in  a  manner  favourable  to 
Laplace's  theory,  it  remains  doubtful  whether  we  could  legitim 
ately  attribute  a  value  to  the  probability  of  an  unknown  prob 
ability's  having  such  and  such  a  value.  If  a  probability  is 
unknown,  surely  the  probability,  relative  to  the  same  evidence, 
that  this  probability  has  a  given  value,  is  also  unknown  ;  and  we 
are  involved  in  an  infinite  regress. 

7.  This  point  leads  on  to  the  second  objection  ;     Laplace's 
theory   requires   tin;   employment  of   both    of   two   inconsistent 
methods.     Let  us  consider  a  number  of  alternatives  r/,,  a.2,  etc., 
having   probabilities  p{,  ;;.,,  etc.  ;    if   we  do  not  know  anything 
about-  nv  we  do  not  know  the  value  of  its  probability  j){,  and  we 
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must  consider  the  various  possible  values  of  p±,  namely  6l5  62,  etc., 
the  probabilities  of  these  possible  values  being  qlt  q2,  etc.  respect 
ively.  There  is  no  reason  why  this  process  should  ever  stop. 
For  as  we  do  not  know  anything  about  blt  we  do  not  know  the 
value  of  its  probability  q^,  and  we  must  consider  the  various 
possible  values  of  ql3  namely  c1?  c2,  etc.,  the  probabilities  of  these 
possible  values  being  r1?  r2,  etc.  respectively  ;  and  so  on.  This 
method  consists  in  supposing  that,  when  we  do  not  know  anything 
about  an  alternative,  we  must  consider  all  the  possible  values  of 
the  probability  of  the  alternative  ;  these  possible  values  can  form 
in  their  turn  a  set  of  alternatives,  and  so  on.  But  this  method 
by  itself  can  lead  to  no  final  conclusion.  Laplace  superimposes 
on  it,  therefore,  his  other  method  of  determining  the  probabilities 
of  alternatives  about  which  we  know  nothing, — namely,  the 
Principle  of  Indifference.  According  to  this  method,  when 
we  know  nothing  about  a  set  of  alternatives,  we  suppose  the 
probabilities  of  each  of  them  to  be  equal.  In  some  parts  of 
his  writings — and  this  is  true  also  of  most  of  his  followers — he 
applies  this  method  from  the  beginning.  If,  that  is  to  say,  we 
know  nothing  about  a1?  since  a±  and  its  contradictory  form  a  pair 
of  exhaustive  alternatives  two  in  number,  the  probability  of  these 

alternatives  is  equal  and  each  is    .     But  in  the  reasoning  which 

2 

leads  up  to  the  Law  of  Succession  he  chooses  to  apply  this  method 
at  the  second  stage,  having  used  the  other  method  at  the  first 
stage.  If,  that  is  to  say,  we  know  nothing  about  alt  its  prob 
ability  P!  may  have  any  of  the  values  blt  b2,  etc.  where  ^  is  any 
fraction  between  0  and  1  ;  and,  as  we  know  nothing  about  the 
probabilities  qlt  q2,  etc.  of  these  alternatives  bly  b2,  etc.,  we  may 
by  the  Principle  of  Indifference  suppose  them  to  be  equal.  This 
account  may  seem  rather  confused  ;  but  it  is  not  easy  to  give 
a  lucid  account  of  so  confused  a  doctrine. 

8.  Turning  aside  from  these  considerations,  let  us  examine 
the  theory,  for  a  moment,  from  another  side.  When  we  reach  the 
Rule  of  Succession,  it  will  be  seen  that  the  hypothetical  a  'priori 
probabilities  are  treated  as  if  they  were  possible  causes  of  the 
event.  It  is  assumed,  that  is  to  say,  that  the  number  of  possible 
sets  of  antecedent  conditions  is  proportional  to  the  number  of 
real  numbers  between  0  and  1  ;  and  that  these  fall  into  equal 
groups,  each  group  corresponding  to  one  of  the  real  numbers 
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between  0  and  1,  this  number  measuring  the  degree  of  probability 
with  which  we  could  predict  the  event,  if  we  knew  that  an  ante 
cedent  condition  belonging  to  that  group  was  fulfilled.  It  is 
then  assumed  that  all  of  these  possible  antecedent  conditions  are 
a  priori  equally  likely.  The  argument  has  arisen  by  false  analogy 
from  the  problem  in  which  a  ball  is  drawn  from  an  urn  containing 
an  infinite  number  of  black  and  white  balls.  But  for  the  assump 
tion  that  we  have  in  general  the  kind  of  knowledge  which  is 
necessary  about  the  possible  antecedents,  no  reasonable  founda 
tion  has  been  suggested. 

l)e  Morgan  endeavoured  to  deal  with  the  difficulty  in  much 
the  same  way  in  the  following  passage  :  l  "  In  determining  the 
chance  which  exists  (under  known  circumstances)  for  the  happen 
ing  of  an  event  a  number  of  times  which  lies  between  certain 
limits,  we  are  involved  in  a  consideration  of  some  difficulty, 
namely,  the  probability  of  a  probability,  or,  as  we  have  called  it, 
the  presumption  of  a  probability.  To  make  this  idea  more  clear, 
remember  that  any  state  of  probability  may  be  immediately 
made  the  expression  of  the  result  of  a  set  of  circumstances,  which 
being  introduced  into  the  question,  the  difficulty  disappears. 
The  word  presumption  refers  distinctly  to  an  act  of  the  mind,  or  a 
state  of  the  mind,  while  in  the  word  probability  we  feel  disposed 
rather  to  think  of  the  external  arrangements  on  the  knowledge 
of  which  the  strength  of  our  presumption  ought  to  depend,  than 
of  the  presumption  iteelf."  The  point  of  this  explanation  lies 
in  the  assumption  that  "  any  state  of  probability  may  be  imme 
diately  made  the  expression  of  the  result  of  a  set  of  circumstances." 
It  cannot  be  allowed  that  this  is  generally  true  ;  2  and  even  in 
those  cases  in  which  it  is  true  we  are  thrown  back  on  the  a  priori 
probabilities  of  the  various  sets  of  circumstances  which  need  not 
be,  as  De  Morgan  assumes,  either  equal  or  exhaustive  alternatives. 

9.  The  proof  of  the  Rule  of  Succession,  which  is  based  upon 
this  theory  of  unknown  probabilities,  is.  briefly,  as  follows  : 

If  x  stands  for  the  a  priori  probability  of  an  event  in  given 
conditions,  then  the  probability  that  the  event  will  occur  m 
times  and  f;nl  n  times  in  these  conditions  is  x  (1  -x)".  If, 
however,  s  is  unknown,  all  values  of  it  between  0  and 

1   ('ii'iin't  /,'//<  i/il<i/xiniia,  }>.  ST. 

1  Kor  instance,  it  is  not  trupovon  in  the  standard  instance  of  hnlls  drawn  from 
an  urn  containing  Mack  and  white  in  unknown  proportions,  unless  the  Dumber 
of  balls  i,  infinite. 
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1  are  a,  priori  equally  probable.  It  follows  from  these  two 
sets  of  considerations  that,  if  the  event  has  been  observed 
to  occur  m  times  out  of  m  +  n,  the  probability  d  posteriori  that 
x  lies  between  x  and  x+dx  is  proportional  to  xm(\-x)ndx, 
and  is  equal,  therefore,  to  Ax'" (I  -x)ndx  where  A  is  a  constant. 
Since  the  event  has  in  fact  occurred,  and  since  x  must  have 
one  of  its  possible  values,  A  is  determined  by  the  equation 

'  ..»(1-,)V,  =  1    ,A=iJ^±0ir 

Hence  the  probability  that  the  event  will  occur  at  the  (m  +  n  +  \  )th 
trial,  when  we  know  that  it  has  occurred  m  times  in  m+n 
trials,  is 

A 


j  o 


If  we  substitute  the  value  of  A  found  above,  this  is  equal   to 

m  +  l     l 
m  +  n +  2 

The  class  of  problem  to  which  the  theorem  is  supposed  to 
apply  is  the  following  :  There  are  certain  conditions  such  that  we 
are  ignorant  d  priori  as  to  whether  they  do  or  do  not  lead  to  the 
occurrence  of  a  particular  event ;  on  m  out  of  m  +  n  occasions, 
however,  on  which  these  conditions  have  been  observed,  the 
event  has  occurred  ;  what  is  the  probability  in  the  light  of  this 
experience  that  the  event  will  occur  on  the  next  occasion  ?  The 

answer  to  all  such  problems  is  In  the  cases  where 

m  +  n +  2 

n=0,  i.e.  when  the  event  has  invariably  occurred,  the  formula 

1  The  theorem  is  sometimes  enunciated  by  contemporary  writers  in  a  much 
more  guarded  form,  e.g.  by  Czuber,  Wahrscheinlichkeitsrechnung,  vol.  i.  p.  197, 
and  by  Bachelier,  Calcul  des  probability's,  p.  487.  Bachelier,  instead  of  assuming 
that  the  d  priori  probabilities  of  all  possible  values  of  the  probability  of  the 
event  are  equal,  writes  &(y)dy  as  the  d  priori  probability  that  the  probability  is 
y,  so  that  after  m  occurrences  is  m  +  n  trials  the  probability  that  the  probability 

lies  between  y  and  y  +  di/  is   V  j    "     I"/   v/.       If  one   has   no   idea  of  J;  a 

jy>"(l  -y)"u>(y)dy 

priori,  he  suggests  that  the  simplest  hypothesis  is  to  put  w=l,  which  leads,  as 
above,  to  Laplace's  Law  of  Succession.  He  also  proposes  the  hypothesis 
u>(y)  =  a  +  a1y  +  a2y*  +  .  .  .,  in  which  case  the  denominator  is  a  series  of  Kulerian 
integrals.  There  is  a  discussion  of  the  Law  of  Succession,  and  of  the  contra 
dictions  and  paradoxes  to  which  it  leads,  by  E.  T.  Whittaker  and  others  in 
Part  VI.  vol.  viii.  (1020^  of  the  Transactions  of  the  Faculty  of  Actuaries  in 
Scotland. 
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yields  the  result  In  the  case  where  the  conditions  have 

m  +  2 

been  observed  once  only  and   the  event  has  occurred  on  Hiat 

o 
occasion,  the  result  is    .     If  the  conditions  have  never  been  mot 

with  at  all,  the  probability  of  the  event  is  o.     And  even  in  the 
case  where  on  the  only  occasion  on  which  the  conditions  were 

observed,  the  event  did  not  occur,  the  probability  is    • 

o 

Some  of  the  flaws  in  this  proof  have  been  already  explained. 
One  minor  objection  may  be  pointed  out  in  addition.  It  is 
assumed  that,  if  x  is  the  a  priori  probability  of  the  event's  happen 
ing  once,  then  x'1  is  the  a  priori  probability  of  its  happening  n 
times  in  succession,  whereas  by  the  theorem's  own  showing  the 
knowledge  that  the  event  has  happened  once  modifies  the  prob 
ability  of  its  happening  a  second  time  ;  its  successive  occurrences 
are  not,  therefore,  independent.  If  the  a  priori  probability  of  the 

event  is  ^  and  if,  after  it  has  been  observed  once,  the  probability 

2 
that  it  will  occur  a  second  time  is  t  ,  then  it  follows  that  the  a 

11  12 

priori  probability  of  its  occurring  twice  is  not      x       but      x    , 

i.e.  -  ;    and  in  general  the  d  priori  probability  of  its  happening 

n  times  in  succession  is  not  (  -  )    but 

2  n  +  1 

10.  But  refinements  of  disproof  are  hardly  needed.  The 
principle's  conclusion  is  inconsistent  with  its  premisses.  We 
begin  with  the  assumption  that  the  d  priori  probability  of  an  event, 
about  which  we  have  no  information  and  no  experience,  is  un 
known,  and  that  all  values  between  0  and  1  are  equally  probable. 
We  end  with  the  conclusion  that  the  d  priori  probability  of 

such  an   event   is  It  has  been  pointed  out  in  S  7  that  this 

4 

contradiction  was  latent,  as  soon  as  the  Principle  of  Indifference 
was  superimposed  on  the  principle  of  unknown  probabilities. 

The  theorem's  conclusions,  moreover,  are  a  reduclio  ad 
absurdum  of  the  reasoning  upon  which  it  is  based.  Who  could 
suppose  that  the  probability  of  a  purely  hypothetical  event,  of 
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whatever  complexity,  in  favour  of  which  no  positive  argument 
exists,  the  like  of  which  has  never  been  observed,  and  which  has 
failed  to  occur  on  the  one  occasion  on  which  the  hypothetical 

conditions  were  fulfilled,  is  no  less  than    ?   Or  if  we  do  suppose  it, 

we  are  involved  in  contradictions, — for  it  is  easy  to  imagine  more 
than  three  incompatible  events  which  satisfy  these  conditions. 

11.  The  theorem  was  first  suggested  by  the  problem  of  the  urn 
which  contains  black  and  white  balls  in  unknown  proportions  : 
m  white  and  n  black  balls  have  been  successively  drawn  and 
replaced  ;   what  is  the  probability  that  the  next  draw  will  yield 
a  white  ball  ?     It  is  supposed  that  all  compositions  of  the  urn  are 
equally  probable,  and  the  proof  then  proceeds  precisely  as  in  the 
case  of  the  more  general  rule  of  succession.     The  rule  of  succession 
has  been,  sometimes,  directly  deduced  from  the  case  of  the  urn, 
by  assimilating  the  occurrence  of  the  event  to  the  drawing  of  a 
white  ball  and  its  non-occurrence  to  the  drawing  of  a  black  ball. 

On  the  hypothesis  that  all  compositions  of  the  urn  are  equally 
probable,  an  hypothesis  to  which  in  general  there  is  nothing  corre 
sponding,  and  on  the  further  hypothesis  that  the  number  of  balls 
is  infinite,  this  solution  is  correct.1  But  the  rule  of  succession 
does  not  apply,  as  it  is  easy  to  demonstrate,  even  to  the  case  of 
balls  drawn  from  an  urn,  if  the  number  of  balls  is  finite.2 

12.  If  the  Rule  of  Succession  is  to  be  adopted  by  adherents  of 
the  Frequency  Theory  of  Probability,3  it  is  necessary  that  they 
should  make  some  modification  in  the  preliminary  reasoning  on 
which  it  is  based.     By  Dr.  Venn,  however,  the  rule  has  been 

1  This  second  condition  is  often  omitted  (e.g.  Bertram!,  Calcul  des  proba- 
bilites,  p.  172). 

2  The  correct,  solution  for  the  case  of  a  finite  number  of  balls,  on  the  hypo 
thesis  that  each  possible  ratio  is  equally  likely,  is  as  follows  :   The  probability 
of  a  black  ball  at  a  further  trial,  after  black  balls  have  been  successively  with- 

1   9 

drawn  and  replaced  p  times,  is        '"^  where  there  are  n  balls  and  sr  represents 

71    Sp 

the  sum  of  the  rth  powers  of  the  first  n  natural  numbers.  This  reduces  to 
p  '.  2'~tllC  solution  usually  given, — when  n  is  infinite.  More  .generally,  if 
p  black  balls  and  q  while  balls  have  been  drawn  and  replaced,  the  chance 

v  r>'+\n-  r)'> 
that  the  next  ball  will  be  black  is 


n     '"  " 

S  r"(n  -  r)'> 


See  Chapter  YITf. 
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explicitly  rejected  on  the  ground  that  it  does  not  accord  with 
experience.1  But  Professor  Karl  Pearson,  who  accepts  it,  has 
made  the  necessary  restatement,2  and  it  will  be  worth  while  to 
examine  the  reasoning  when  it  is  put  in  this  form.  Professor 
Pearson's  proof  of  the  Rule  of  Succession  is  as  follows  : 

"  I  start,  as  most  mathematical  writers  have  done,  with  '  the 
equal  distribution  of  ignorance,'  or  I  assume  the  truth  of  Bayes' 
Theorem.  I  hold  this  theorem  not  as  rigidly  demonstrated,  but 
I  think  with  Edgeworth  3  that  the  hypothesis  of  the  equal  dis 
tribution  of  ignorance  is,  within  the  limits  of  practical  life,  justi 
fied  by  our  experience  of  statistical  ratios,  which  d  priori  are 
unknown,  i.e.  such  ratios  do  not  tend  to  cluster  markedly  round 
any  particular  value.  '  Chances  '  lie  between  0  and  1,  but  our 
experience  does  not  indicate  any  tendency  of  actual  chances  to 
cluster  round  any  particular  value  in  this  range.  The  ultimate 
basis  of  the  theory  of  statistics  is  thus  not  mathematical  but 
observational.  Those  who  do  not  accept  the  hypothesis  of  the 
equal  distribution  of  ignorance  and  its  justification  in  observation 
are  compelled  to  produce  definite  evidence  of  the  clustering  of 
chances,  or  to  drop  all  application  of  past  experience  to  the  judg 
ment  of  probable  future  statistical  ratios.  .  .  . 

"  Let  the  chance  of  a  given  event  occurring  be  supposed  to  lie 
between  x  and  x  +  dx,  then  if  on  n  --=p  +  q  trials  an  event  has  been 
observed  to  occur  p  times  and  fail  q  times,  the  probability  that 
the  true  chance  lies  between  x  and  x  +  dx  is,  on  the  equal 
distribution  of  our  ignorance. 


''  This  is  Bayes'  Theorem.  .  .  .4 

1    l.'xjir  of  fJ/uince,  ]>.   1«)7. 

a  "  On  the  Influence  of  Past  Experience  on  Future  Kxperienco  on  Future 
KxpcPtation,"  1'liil.  May.,  1007,  pp.  .'{65-U7S.  The  quotations  given  below  ;,,<• 
taken  from  tin's  article. 

3  Tins   reference   is,   no  doubt,   to   Kdgcwurlh's   "  Philosophv   <>f  Chanee" 
(J/i'/if/,  lsv.4,  p.  L'.'JO).  when  ho  wrote  :    "  Tho  assumption  that  any  probability- 
constant  about  which  wo  know  nothing  in  particular  is  as  likely  to  have  one  value 
as  another  in  grounded  upon  the  rough  but  solid  cxjicricnce  that  such  constants 
do,  as  a  matter  "f  fact,  as  often  have  one  value  as  another."     See  also  Chapter 
VI 1.  $  6,  above. 

4  Professor  Pearson's  use  of  thin  title  for  tho  above  formula  is  not,  I  think. 
historically  correct.      Hayes*  Theorem  is  tho  Invent  Principle  of   Pr.ib  .bility 
itself,  and  not  this  extension  of  it. 
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"Now  suppose  that  a  second  trial  of  m—r  +  s  instances  be 
made,  then  the  probability  that  the  given  event  will  occur  r  times 
and  fail  s,  is  on  the  d  priori  chance  being  between  x  and  x  +  dx 


and  accordingly  the  total  chance  C,,,  whatever  x  may  be  of  the 
event  occurring  r  times  in  the  second  series,  is 


\lxp(l-x)'<dx 
J  o 

This  is,  with  a  slight  correction,  Laplace's  extension  of  Bayes' 
Theorem."  1 

13.  This  argument  can  be  restated  as  follows.  Of  all  the 
objects  which  satisfy  (f>(x),  let  us  suppose  that  a  proportion  p 
also  satisfy  f(x).  In  this  case  p  measures  the  probability  that 
any  object,  of  which  we  know  only  that  it  is  </>,  is  in  fact  also/. 
Now  if  we  do  not  know  the  value  of  p  and  have  no  relevant  in 
formation  which  bears  upon  it,  we  can  assume  d  priori  that  all 
values  of  p  between  0  and  1  are  equally  likely.  This  assumption, 
which  is  termed  the  '  equal  distribution  of  ignorance,'  is  justified 
by  our  experience  of  statistical  ratios.  Our  experience,  that  is 
to  say,  leads  us  to  suppose  that  of  all  the  theories,  which  could  be 
propounded,  there  are  just  as  many  which  are  always  true  as 
there  are  which  are  always  false,  just  as  many  which  are  true  once 
in  fifty  times  as  there  are  which  are  true  once  in  three  times,  and 
so  on.  Professor  Pearson  challenges  those  who  do  not  accept 
this  assumption  to  produce  definite  evidence  to  the  contrary. 

The  challenge  is  easily  met.  It  would  not  be  difficult  to  pro 
duce  10,000  positive  theories  which  are  always  false  corresponding 
to  every  one  which  is  always  true,  and  10,000  correlations  of  posi- 

1  The  rest  of  the  article  is  concerned  with  the  determination  of  the  probable 
error  when  Laplace's  Rule  of  Succession  is  used  not  simply  to  yield  the  prob 
ability  of  a  single  additional  occurrence,  but  to  predict  the  probable  limits  within 
which  the  frequency  will  lie  in  a  considerable  series  of  additional  trials.  Pro 
fessor  Pearson's  method  applies  more  rigorous  methods  of  approximation  to 
the  fundamental  formulae  given  above  than  have  been  somciimes  used.  As 
my  main  purpose  in  this  chapter  is  to  dispute  the  general  validity  of  the  funda 
mental  formulae,  it  is  not  worth  while  to  consider  these  further  developments 
here.  If  the  validity  of  the  fundamental  formula  were  to  be  granted,  Professor 
Pearson's  methods  of  approximation  would,  I  think,  be  satisfactory. 
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tive  qualities  which  liokl  less  often  than  once  in  three  times  for 
every  one  we  can  name  which  holds  more  often  than  once  in  three 
times.  And  the  converse  is  the  case  for  negative  theories  and 
correlations  between  negative  qualities  ;  for  corresponding  to 
every  positive  theory  which  is  true  there  is  a  negative  theory 
which  is  false,  and  so  on.  Thus  experience,  if  it  shows  anything, 
shows  that  there  is  a  very  marked  clustering  of  statistical  ratios 
in  the  neighbourhoods  of  zero  and  unity, — of  those  for  positive 
theories  and  for  correlations  between  positive  qualities  in  the 
neighbourhood  of  zero,  and  of  those  for  negative  theories  and  for 
correlations  between  negative  qualities  in  the  neighbourhood  of 
unity.  Moreover,  we  are  seldom  in  so  complete  a  state  of  ignor 
ance  regarding  the  nature  of  the  theory  or  correlation  under 
investigation  as  not  to  know  whether  or  not  it  is  a  positive  theory 
or  a  correlation  between  positive  qualities.  In  general,  therefore, 
whenever  our  investigation  is  a  practical  one,  experience,  if  it 
tells  us  anything,  tells  us  not  only  that  the  statistical  ratios  cluster 
in  the  neighbourhood  of  zero  and  unity,  but  in  which  of  these  two 
neighbourhoods  the  ratio  in  this  particular  case  is  most  likely 
a  priori  to  be  found.  If  we  seek  to  discover  what  proportion  of 
the  population  suffer  from  a  certain  disease,  or  have  red  hair,  or 
are  called  Jones,  it  is  preposterous  to  suppose  that  the  proportion 
is  as  likely  «  priori  to  exceed  as  to  fall  short  of  (say)  fifty  per  cent. 
As  Professor  Pearson  applies  this  method  to  investigations  where 
it  is  plain  that  the  qualities  involved  are  positive,  he  seems  to 
maintain  that  experience  shows  that  there  arc  as  many  positive 
attributes  which  are  shared  by  more  than  half  of  any  population 
as  there  are  which  are  shared  by  less  than  half. 

It  is  al.so  worth  while  to  point  out  that  it  is  formally  impossible 
that  it  should  be  true  of  all  characters,  simple  and  complex,  that 
they  are  as  likely  to  have  any  one  frequency  as  any  other.  For  let 
us  take  a  character  c  which  is  compound  of  two  characters  a  and 
6,  between  which  there  is  no  association,  and  let  us  suppose  that 
a  has  a  frequency  x  in  the  population  in  question  and  that  b  has 
a  frequency  y,  so  that,  in  the  absence  of  association,  the  frequency 
z  of  c  is  equal  to  #//.  Then  it  is  easy  to  show  that,  if  all  values  of 
x  and  y  between  0  and  1  are  equally  probable,  all  values  of  z 

between  0   and    i   are   not  equally  probable.     For  the  value 
is    more    probable    than   any  other,  and    the   possible   values  of 
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z  become  increasingly  improbable  as   they  differ   more   widely 
from    • 

It  may  be  added  that  the  conclusions,  which  Professor 
Pearson  himself  derives  from  this  method,  provide  a  reduclio 
ad  absurdum  of  the  arguments  upon  which  they  rest.  He  con 
siders,  for  example,  the  following  problem  :  A  sample  of  100  of  a 
population  shows  10  per  cent  affected  with  a  certain  disease. 
What  percentage  may  be  reasonably  expected  in  a  second  sample 
of  100  ?  By  approximation  he  reaches  the  conclusion  that  the 
percentage  of  the  character  in  the  second  sample  is  as  likely  to 
fall  inside  as  outside  the  limits,  7-85  and  13-71.  Apart  from  the 
preceding  criticisms  of  the  reasoning  upon  which  this  depends, 
it  does  not  seem  reasonable  upon  general  grounds  that  we  should 
be  able  on  so  little  evidence  to  reach  so  certain  a  conclusion.  The 
argument  does  not  require,  for  example,  that  we  have  any  know 
ledge  of  the  manner  in  which  the  samples  are  chosen,  of  the 
positive  and  negative  analogies  between  the  individuals,  or  indeed 
anything  at  all  beyond  what  is  given  in  the  above  statement. 
The  method  is,  in  fact,  much  too  powerful.  It  invests  any  posi 
tive  conclusion,  which  it  is  employed  to  support,  with  far  too  high 
a  degree  of  probability.  Indeed  this  is  so  foolish  a  theorem 
that  to  entertain  it  is  discreditable. 

14.  The  Rule  of  Succession  has  played  a  very  important  part 
in  the  development  of  the  theory  of  probability.  It  is  true  that 
it  has  been  rejected  by  Boole  1  on  the  ground  that  the  hypotheses 
on  which  it  is  based  are  arbitrary,  by  Venn  2  on  the  ground  that  it 
does  not  accord  with  experience,  by  Bertrand 3  because  it  is 
ridiculous,  and  doubtless  by  others  also.  But  it  has  been  very 
widely  accepted, — by  De  Morgan,4  by  Jevons,5  by  Lotze,6  by 
Czuber,7  and  by  Professor  Pearson,8 — to  name  some  representative 
writers  of  successive  schools  and  periods.  And,  in  any  case,  it 

3  Lawn  of  Thought,  p.  36U.  2  Logic  of  Chance,  p.  197. 

:i  Calcul  des  probabililes,  p.  174. 

1  Article  in  Cabinet  Encyclopaedia,  p.  64.       5  Principles  of  Science,  p.  2i)7. 

G  Logic,  pp.  373,  374 ;  Lotze  propounds  a  "  simple  deduction  "  "  as  convin 
cing  "  to  him  "as  the  more  obscure  analysis,  by  which  it  is  usually  obtained." 
The  proof  is  among  the  worst  ever  conceived,  and  may  be  commended  to  those 
who  seek  instances  of  the  profound  credulity  of  even  considerable  thinkers. 

7  Wdhrscheinlichkeitsreclmu'ng,  vol.  \.  p.  I  Us), —  though  much  more  guardedly 
and  with  more  qualifications  than  in  the  form  discussed  above. 

a  Loc.  cil. 


STATISTICAL  INFERENCE  383 

is  of  interest  as  being  one  of  the  most  characteristic  results  of  a 
way  of  thinking  in  probability  introduced  by  Laplace,  and  never 
thoroughly  discarded  to  this  day.  Even  amongst  those  writers 
who  have  rejected  or  avoided  it,  this  rejection  has  been  due 
more  to  a  distrust  of  the  particular  applications  of  which  the  law 
is  susceptible  than  to  fundamental  objections  against  almost 
every  step  and  every  presumption  upon  which  its  proof  depends. 
Some  of  these  particular  applications  have  certainly  been 
surprising.  The  law,  as  is  evident,  provides  a  numerical  measure 
of  the  probability  of  any  simple  induction,  provided  only  that  our 
ignorance  of  its  conditions  is  sufficiently  complete,  and,  although, 
when  the  number  of  cases  dealt  with  is  small,  its  results  are  in 
credible,  there  is,  when  the  number  dealt  with  is  large,  a  certain 
plausibility  in  the  results  it  gives.  But  even  in  these  cases 
paradoxical  conclusions  arc  not  far  out  of  sight.  When  Laplace 
proves  that,  account  being  taken  of  the  experience  of  the  human 
race,  the  probability  of  the  sun's  rising  to-morrow  is  1,826,214  to  1 , 
this  large  number  may  seem  in  a  kind  of  way  to  represent  our 
state  of  mind  about  the  matter.  But  an  ingenious  German, 
Professor  Bobek,1  has  pushed  the  argument  a  degree  further,  and 
proves  by  means  of  these  same  principles  that  the  probability  of 
the  sun's  rising  every  day  for  the  next  4000  years,  is  not  more, 
approximately,  than  two-thirds, — a  result  less  dear  to  our  natural 
prejudices. 

1  Lc.lirlnch  ihr  WahrscheinUchkeilsrechnung,  p.  208. 


CHAPTER  XXXI 

THE   INVERSION   OF   BERNOULLI'S   THEOREM 

1.  I  CONCLUDE,  then,  that  the  application  of  the  mathematical 
methods,  discussed  in  the  preceding  chapter,  to  the  general 
problem  of  statistical  inference  is  invalid.  Our  state  of  know 
ledge  about  our  material  must  be  positive,  not  negative,  before 
we  can  proceed  to  such  definite  conclusions  as  they  purport  to 
justify.  To  apply  these  methods  to  material,  unanalysed  in 
respect  of  the  circumstances  of  its  origin,  and  without  reference 
to  our  general  body  of  knowledge,  merely  on  the  basis  of  arith 
metic  and  of  those  of  the  characteristics  of  our  material  with 
which  the  methods  of  descriptive  statistics  are  competent  to 
deal,  can  only  lead  to  error  and  to  delusion. 

But  I  go  further  than  this  in  my  opposition  to  them.  Not 
only  are  they  the  children  of  loose  thinking,  and  the  parents  of 
charlatanry.  Even  when  they  are  employed  by  wise  and  com 
petent  hands,  I  doubt  whether  they  represent  the  most  fruitful 
form  in  which  to  apply  technical  and  mathematical  methods  to 
statistical  problems,  except  in  a  limited  class  of  special  cases. 
The  methods  associated  with  the  names  of  Lexis,  Von  Bortkiewicz, 
and  Tschuprow  (of  whom  the  last  named  forms  a  link,  to  some 
extent,  between  the  two  schools),  which  will  be  briefly  described 
in  the  next  chapter,  seem  to  me  to  be  much  more  clearly  con 
sonant  with  the  principles  of  sound  induction. 

2.  Nevertheless  it  is  natural  to  suppose  that  the  fundamental 
ideas,  from  which  these  methods  have  sprung,  are  not  wholly 
egares.  It  is  reasonable  to  presume  that,  subject  to  suitable  con 
ditions  and  qualifications,  an  inversion  of  Bernoulli's  Theorem 
must  have  validity.  If  we  knew  that  our  material  could  be 
likened  to  a  game  of  chance,  wre  might  expect  to  infer  chances 
from  frequencies,  with  the  same  sort  of  confidence  as  that  with 
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which  we  infer  frequencies  from  chances.  This  part  of  our 
inquiry  will  not  be  complete,  therefore,  until  we  have  endeavoured 
to  elucidate  the  conditions  for  the  validity  of  an  Inversion  of 
Bernoulli's  Theorem. 

3.  The  problem  is  usually  discussed  in  terms  of  the  happening 
of  an  event  under  certain  conditions,  that  is  to  say,  of  the  co 
existence  of  the  conditions,  as  affecting  a  particular  event,  with 
that  event.  The  same  problem  can  be  dealt  with  more  generally 
and  more  conveniently  as  an  investigation  of  the  correlation 
between  two  characters  A(x)  and  B(z),  which,  as  in  Part  III., 
are  prepositional  functions  which  may  be  said  to  concur  or  co 
exist  when  they  are  both  true  of  the  same  argument  x.  Given 
that,  within  the  field  of  our  knowledge,  B(z)  is  true  for  a  certain 
proportion  of  the  values  of  x  for  which  A(z)  is  true,  what  is  the 
probability  for  a  further  value  a  of  x  that,  if  A(a)  holds,  B(a)  will 
hold  also  ? 

Let  us  suppose  that  the  occurrence  of  an  instance  of  A(z)  is  a 
sign  of  one  of  the  events  e^x),  c.2(x)  ...  or  em(x),  and  that  these 
are  exhaustive,  exclusive,  and  ultimate  alternatives.  By  ex 
haustive  it  is  meant  that,  whenever  there  is  an  instance  of  A(z), 
one  of  the  e's  is  present ;  by  exclusive,  that  the  presence  of  one 
of  the  e's  is  not  a  sign  of  the  presence  of  any  other,  but  not  that 
the  concurrence  of  two  or  more  of  the  e's  is  in  fact  impossible  ; 
by  ultimate,  that  no  one  of  the  e's  is  a  disjunction  of  two  or  more 
alternatives  which  might  themselves  be  members  of  the  e's. 
Let  us  assume  that  these  alternatives  are  initially  and  throughout 
the  argument  equally  probable,  which,  subject  to  the  above  con 
ditions,  is  justified  by  the  Principle  of  Indifference.  We  have  no 
reason,  that  is  to  say,  and  no  part  of  our  evidence  ever  gives  us 
one,  for  thinking  that  A(«)  is  more  likely  to  be  a  sign  of  one  of  the 
e's  than  of  any  other,  or  even  for  thinking  that  some  e's,  although 
we  do  not  know  which,  are  more  likely  to  occur  than  others. 
Let  us  also  assume  that,  out  of  e^x),  e2(x)  .  .  .  em(x),  the  set 
ei(x],  e2(x)  •  •  <';(x),  and  these  only,  are  signs  or  occasions  of 
B(r)  ;  and  further  that  we  have  no  evidence  bearing  on  the  actual 
magnitude  of  the.  integers  /  and  m,  so  that  the  ratio  l/m  is  the 
only  factor  of  which  the  probability  varies  as  the  evidence 
accumulates.  Let  us  assume,  lastly,  that  our  knowledge  of  the 
several  instances  of  B(x)  is  adequate  to  establish  a  perfect  analogy 
between  them  ;  the  instances  a,  etc.,  of  B(z),  that  is  to  say,  must 

2  c 
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not  have  anything  in  common  except  B,  unless  we  have  reason 
to  know  that  the  additional  resemblances  are  immaterial.  Even 
by  these  considerable  simplifications  not  every  difficulty  has 
been  avoided.  But  a  development  along  the  usual  lines  with 
the  assistance  of  Bernoulli's  Theorem  is  now  possible. 

Let  l/m  =  q.  If  the  value  of  q  were  known,  the  problem  would 
be  solved.  For  this  numerical  ratio  would  represent  the  prob 
ability  that  A  is,  in  any  random  instance,  a  sign  of  B  ;  and  no 
further  evidence,  which  satisfies  the  conditions  of  the  preceding 
hypothesis,  can  possibly  modify  it.  But  in  the  inverse  problem 
q  is  not  known  ;  and  our  problem  is  to  determine  whether  evidence 
can  be  forthcoming  of  such  a  kind,  that,  as  this  evidence  is  in 
creased  in  quantity,  the  probability  that  A  will  be  in  any  instance 
a  si<m  of  B,  tends  to  a  limit  which  lies  between  two  determinate 
ratios,  just  as  the  probability  of  an  inductive  generalisation  may 
tend  towards  certainty,  when  the  evidence  is  increased  in  a 
manner  satisfying  given  conditions. 

Let  f(q)  represent  the  proposition  that  q  is  the  true  value  of 
l/m.  Let  q'  represent  the  ratio  of  the  number  of  instances  actually 
before  us  in  which  A  has  been  accompanied  by  B  to  that  of  the 
instances  in  which  A  has  not  been  accompanied  by  B  ;  and  let 
f'(qr)  be  the  proposition  which  asserts  this.  Now  if  the  ratio  q 
is  known,  then,  subject  to  the  assumptions  already  stated,  the 
number  q  must  also  represent  the  d  priori  probability  in  any 
instance,  both  before  and  after  the  results  of  other  instances  are 
known,  that  A,  if  it  occurs,  will  be  accompanied  by  B.  We  have, 
in  fact,  the  conditions  as  set  forth  in  Chapter  XXIX.,  in  which 
Bernoulli's  Theorem  can  be  validly  applied,  so  that  this  theorem 
enables  us  to  give  a  numerical  value,  for  all  numerical  values  of 
q  and  q,  to  the  probability  f(q')/h  .  /(?),—  which  expression  repre 
sents  the  likelihood  d  priori  of  the  frequency  q',  given  q. 

An  application  of  the  inverse  formula  allows  us  to  infer  from 
the  above  the  d  posteriori  probability  of  q,  given  q',  namely  : 


where  the  summation  in  the  denominator  covers  all  possible 
values  of  q.  In  rough  applications  of  this  inverse  of  Bernoulli's 
Theorem  it  has  been  usual  to  suppose  that/(g)/A  is  constant  for 
all  values  of  q—  that,  in  other  words,  all  possible  values  of  the 
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ratio  q  are  d  priori  equally  likely.  If  this  supposition  were 
legitimate,  the  formula  could  be  reduced  to  the  algebraical  ex 
pression 


all  the  terms  of  which  can  be  determined  numerically  by  Ber 
noulli's  Theorem.  It  is  easy  to  show  that  it  is  a  maximum  when 
q=q',  i.e.  that  q'  is  the  most  probable  value  of  l/m,  and  that, 
when  the  instances  art;  very  numerous,  it  is  very  improbable  that 
l/m  differs  from  </'  widely.  If,  therefore,  the  number  of  instances 
is  increased  in  such  a  manner  that  the  ratio  continues  in  the 
neighbourhood  of  q',  the  probability  that  the  true  value  of  l/m 
is  nearly  q  tends  to  certainty  ;  and,  consequently,  the  prob 
ability,  that  A  is  in  any  instance  a  sign  of  B,  also  tends  to  a 
magnitude  which  is  measured  by  q  '  . 

1  see,  however,  no  justification  for  the  assumption  that  all 
possible  values  of  the  ratio  q  are  d  priori  equally  likely.  It  is 
not  even  equivalent  to  the  assumptions  that  all  integral  values 
of  /  and  m  respectively  are  equally  probable.  I  am  not  satisfied 
either  that  different  values  of  q,  or  that  different  values  of  //<.. 
satisfy  the  conditions  which  have  been  laid  down  in  Part  I.  for 
alternatives  which  are  equal  before  the  Principle  of  Indifference. 
There  seem,  for  instance,  to  be  relevant  differences  between  the 
statement  that  A  can  arise  in  exactly  two  ways  and  the  state 
ment  that  it  can  arise  in  exactly  a  thousand  ways.  We  must, 
therefore,  be  content  with  some  lesser  assumption  and  with  a 
less  precise  form  for  our  final  conclusion. 

4.  Since,  in  accordance  with  our  hypothesis,  m  cannot  exceed 
some  finite  number,  and  since  I  must  necessarily  be  less  than  m, 
the  possible  values  of  m,  and  therefore  of  q,  are  finite  in  number. 
Perhaps  we  can  assume,  therefore,  as  one  of  our  fundamental 
assumptions,  that  there  is  d  priori  a  finite  probability  in  favour 
of  ouch  <>f  these  possible  values.  Let  /z  be;  the  finite  number 
which  /n  cannot  exceed.  Then  there  is  a  finite  probability  for 
each  of  the  intervals  l 

1.2      2.3  fi  -  i  , 

to  -,         to     ,     ...  to  I 

/z         //,       p         /z  /z 

1  Tho  intervals  arc  supposed  to  include?  their  lower  but  n«>l  (heir  upper 
limit. 
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that  q  lies  in  this  interval  ;   but  we  cannot  assume  that  there  is 
an  equal  probability  for  each  interval. 
We  must  now  return  to  the  formula 


which  represents  the  d  posteriori  probability  of  q,  given  q'.  Since 
by  sufficiently  increasing  the  number  of  instances,  the  sum  of 
terms  f(q')/hf(q)  for  possible  values  of  q  within  a  certain  finite 
interval  in  the  neighbourhood  of  q'  can  be  made  to  exceed  the 
other  terms  by  any  required  amount,  and  since  the  sum  of  the 
values  of  f(q)]h  for  possible  values  of  q  within  this  interval  is 
finite,  it  clearly  follows  that  a  finite  number  of  instances  can 
make  the  probability,  that  q  lies  in  an  interval  of  magnitude 
I///,  in  the  neighbourhood  of  q',  to  differ  from  certainty  by  less 
than  any  finite  amount  however  small. 

5.  We  have,  therefore,  reached  the  main  part  of  the  conclusion 
after  which  we  set  out  —  namely,  that  as  the  number  of  instances 
is  increased  the  probability,  that  q  is  in  the  neighbourhood  of 
q',  tends  towards  certainty  ;  and  hence  that,  subject  to  certain 
specified  conditions,  if  the  frequency  with  which  B  accompanies 
A  is  found  to  be  q'  in  a  great  number  of  instances,  then  the 
probability  that  A  will  be  accompanied  by  B  in  any  further 
instance  is  also  approximately  q'.  But  we  are  left  with  the  same 
vagueness,  as  in  the  case  of  generalisation,  respecting  the  value 
of  fj,  and  the  number  of  instances  that  we  require.  We  know 
that  we  can  get  as  near  certainty  as  we  choose  by  a  finite  number 
of  instances,  but  what  this  number  is  we  do  not  know.  This  is 
not  very  satisfactory,  but  it  accords  very  well,  I  think,  with 
what  common  sense  tells  us.  It  would  be  very  surprising,  in 
fact,  if  logic  could  tell  us  exactly  how  many  instances  we  want, 
to  yield  us  a  given  degree  of  certainty  in  empirical  arguments. 

Nobody  supposes  that  we  can  measure  exactly  the  probability 
of  an  induction.  Yet  many  persons  seem  to  believe  that  in  the 
weaker  and  much  more  difficult  type  of  argument,  where  the 
association  under  examination  has  been  in  our  experience,  not 
invariable,  but  merely  in  a  certain  proportion,  we  can  attribute 
a  definite  measure  to  our  future  expectations  and  can  claim 
practical  certainty  for  the  results  of  predictions  which  lie  within 
relatively  narrow  limits.  Coolly  considered,  this  is  a  preposter- 
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ous  claim,  which  would  have  been  universally  rejected  long  ago, 
if  those  who  made  it  had  not  so  successfully  concealed  them 
selves  from  the  eyes  of  common  sense  in  a  maze  of  mathematics. 
6.  Meantime  we  are  in  danger  of  forgetting  that,  in  order  to 
reach  oven  our  modified  conclusion,  material  assumptions  have 
been  introduced.  In  the  first  place,  we  are  faced  with  exactly 
the  same  difficulties  as  in  the  case  of  universal  induction  dealt 
with  in  Part  III.,  and  our  original  starting-point  must  be  the 
same.  We  have  the  same  difficulty  as  to  how  our  initial  prob 
ability  is  to  be  obtained  ;  and  I  have  no  better  suggestion  to  oiler 
in  this  than  in  the  former  case — namely,  the  supposed  principle 
of  a  limitation  of  independent  variety  in  experience.  We  have 
to  suppose  that  if  A  and  B  occur  together  (i.e.  are  true  of  the 
same  object),  this  is  some  just  appreciable  reason  for  supposing 
that  in  this  instance  they  have  a  common  cause  ;  and  that,  if 
A  occurs  again,  this  is  a  just  appreciable  reason  for  supposing 
that  it  is  due  to  the  same  cause  as  on  the  former  occasion.  But 
in  addition  to  the  usual  inductive  hypothesis,  the  argument  has 
rested  on  two  particularly  important  assumptions,  first,  that  we 
have  no  reason  for  supposing  that  some  of  the  events  of  which 
A  may  be  a  sign  are  more  likely  to  be  exemplified  in  some  of  the 
particular  instances  than  in  others,  and  secondly,  that  the  analogy 
amongst  the  examined  B's  is  perfect.  The  first  assumption 
amounts,  in  the  language  of  statisticians,  to  an  assumption  of 
random  sampling  from  amongst  the  A's.  The  second  assumption 
corresponds  precisely  to  the  similar  condition  which  we  discussed 
fully  in  connection  with  inductive  generalisation.  The  instances 
of  A(x)  may  be  the  result  of  random  sampling,  and  yet  it  may 
still  be  the  case  that  there  are  material  circumstances,  common 
to  all  the  examined  instances  of  B(.r),  yet  not  covered  by  the 
statement  A(x)B(x).  In  so  far  as  these  two  assumptions  are  not 
justified,  an  element  of  doubt  and  vagueness,  which  is  not  easily 
measured,  assails  the  argument.  It  is  an  element  of  doubt 
precisely  similar  to  that  which  exists  in  the  case  of  generalisa 
tion.  But  we  are  most  likely  to  forget  it.  For  having  overcome 
the  difficulties  peculiar  to  correlation,1  it  is,  possibly,  not  uii- 

1  I  am  here  using  this  term  in  distinction  to  generali-tation ;  that  is  to  say, 
I  call  the  .statement  that  A(z)  in  always  accompanied  by  JJ(.r)  a  i/cneralwalion, 
and  the  statement  that  A(x)  is  accompanied  by  -B(ar)  in  a  certain  proportion 
of  cases  a  correlation.  This  is  not  quite  identical  with  its  use  by  modern 
statisticians. 
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natural  for  a  statistician  to  feel  as  if  he  had  overcome  all  the 
difficulties. 

In  practice,  however,  our  knowledge,  in  cases  of  correlation 
just  as  in  cases  of  generalisation,  will  seldom  justify  the  assump 
tion  of  perfect  analogy  between  the  B's  ;  and  we  shall  he  faced 
by  precisely  the  same  problems  of  analysing  and  improving  our 
knowledge  of  the  instances,  as  in  the  general  case  of  induction 
already  examined.  If  B  has  invariably  accompanied  A  in  100 
cases,  we  have  all  kinds  of  difficulties  about  the  exact  character 
of  our  evidence  before  we  can  found  on  this  experience  a  valid 
generalisation.  If  B  has  accompanied  A,  not  invariably,  but 
only  50  times  in  the  100  cases,  clearly  we  have  just  the  same 
kind  of  difficulties  to  face,  and  more  too,  before  we  can  announce 
a  valid  correlation.  Out  of  the  mere  analysed  statement  that  B 
has  accompanied  A  as  often  as  not  in  100  cases,  without  precise 
particulars  of  the  cases,  or  even  if  there  were  1,000,000  cases 
instead  of  100,  we  can  conclude  very  little  indeed. 


CHAPTER  XXXTI 

Till-:  INDUCTIVE  USE  OF  STATISTICAL  FREQUENCIES  FOR  THE 
DETERMINATION'  OF  PROBABILITY  .1  roXTERHHtl  -  THE 
METHODS  OF  LEXIS 

1.  No  one  supposes  that  a  good  induction  can  be  arrived  at 
merely  by  counting  cases.  The  business  of  strengthening  the 
argument  chiefly  consists  in  determining  whether  the  alleged 
association  is  stable,  when  the  accompanying  conditions  are 
varied.  This  process  of  improving  the  Analogy,  as  I  have  called 
it  in  Part  III.,  is,  both  logically  and  practically,  of  the  essence  of 
the  argument. 

Now  in  statistical  reasoning  (or  inductive  correlation)  that 
part  of  the  argument,  which  corresponds  to  counting  the  cases 
in  inductive  generalisation,  may  present  considerable  technical 
dilliculty.  This  is  especially  so  in  the  particularly  complex  cases 
of  what  in  the  next  chapter  (§  9)  I  shall  term  Quantitative  Cor 
relation,  which  have  greatly  occupied  the  attention  of  English 
statisticians  in  recent  years.  But  clearly  it  would  be  an  error  to 
suppose  that,  when  we  have  successfully  overcome  the  mathe 
matical  or  other  technical  difficulties,  we  have  made  anv  greater 
progress  towards  establishing  our  conclusion  than  when,  in  the 
case  of  inductive  generalisation,  we  have  counted  the  cases  but 
have  not  yet  analysed  or  compared  the  descriptive  and  non- 
numerical  differences  and  resemblances.  In  order  to  get  a  good 
scientific  argument  we  still  have  to  pursue  precisely  the  same 
scientific  methods  of  experiment,  analvsis,  comparison,  and 
differentiation  as  are  recognised  to  be  necessary  to  establish  any 
scientific  generalisation.  These  methods  are  not  reducible  to  a 
precise  mathematical  form  for  the  reasons  examined  in  Part  III. 
of  this  treatise.  But  that  is  no  reason  for  ignoring  them,  or  for 
pretending  that  the  calculation  of  a  probability,  which  takes  into 
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account  nothing  whatever  except  the  numbers  of  the  instances, 
is  a  rational  proceeding.  The  passage  already  quoted  from 
Leibniz  (In  exemplis  juridicis  politicisque  plerumque  non  tamen 
subtili  calculo  opus  est,  quam  accurata  omnium  circumstantiarum 
enumeratione)  is  as  applicable  to  scientific  as  to  political  inquiries. 
Generally  speaking,  therefore,  I  think  that  the  business  of 
statistical  technique  ought  to  be  regarded  as  strictly  limited  to 
preparing  the  numerical  aspects  of  our  material  in  an  intelligible 
form,  so  as  to  be  ready  for  the  application  of  the  usual  inductive 
methods.  Statistical  technique  tells  us  how  to  '  count  the  cases ' 
when  we  are  presented  with  complex  material.  It  must  not 
proceed  also,  except  in  the  exceptional  case  where  our  evidence 
furnishes  us  from  the  outset  with  data  of  a  particular  kind,  to 
turn  its  results  into  probabilities  ;  not,  at  any  rate,  if  we  mean 
by  probability  a  measure  of  rational  belief. 

2.  There  is,  however,  one  type  of  technical,  statistical  investi 
gation  not  yet  discussed,  which  seems  to  me  to  be  a  valuable 
aid  to  inductive  correlation.     This  method  consists  in  breaking 
up  a  statistical  series,  according  to  appropriate  principles,  into 
a  number  of  sub-series,  with  a  view  to  analysing  and  measuring, 
not  merely  the  frequency  of  a  given  character  over  the  aggregate 
series,   but   the  stability   of   this  frequency  amongst  the  sub- 
series  ;  that  is  to  say,  the  series  as  a  whole  is  divided  up  by  some 
principle  of  classification  into  a  set  of  sub-series,  and  ike  fluctua 
tion  of  the  statistical  frequency  under  examination  between  the 
various  sub-series  is  then  examined.     It  is,  in  fact,  a  technical 
method  of  increasing  the  Analogy  between  the  instances,  in  the 
sense  given  to  this  process  in  Part  III. 

3.  The  method  of  analysing  statistical  series,  as  opposed  to 
the  Laplacian  or  mathematical  method,  one  might  designate  the 
inductive    method.     Independently    of    the    investigations    of 
Bernoulli  or  Laplace,  practical  statisticians  began  at  least  as  early 
as  the  end  of  the  seventeenth  century  l  to  pay  attention  to  the 
stability   of   statistical   series    when   analysed   in   this   manner. 
Throughout    the    eighteenth    century,    students    of    mortality 
statistics,  and  of  the  ratio  of  male  to  female  births  (including 
Laplace  himself),  paid  attention  to  the  degree  of  constancy  of  the 

1  Graunt  in  his  Natural  and  Political  Observations  upon  the  Bills  of  Mortality 
has  been  quoted  as  one  of  the  earliest  statisticians  to  pay  attention  to  these 
considerations. 
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ratios  over  different  parts  of  their  series  of  instances  as  well  as 
to  their  average  value  over  the  whole  series.  And  in  the  early 
part  of  the  nineteenth  century,  Quetelet,  as  we  have  already 
noticed,  widely  popularised  the  notion  of  the  stability  of  various 
social  statistics  from  year  to  year.  Quetelet,  however,  sometimes 
asserted  the  existence  of  stability  on  insufficient  evidence,  and 
involved  himself  in  theoretical  errors  through  imitating  the 
methods  of  Laplace  too  closely  ;  and  it  was  not  until  the  last 
quarter  of  the  nineteenth  century  that  a  school  of  statistical 
theory  was  founded,  which  gave  to  this  way  of  approaching  the 
problem  the  system  and  technique  which  it  had  hitherto  lacked, 
and  at  the  same  time  made  explicit  the  contrast  between  this 
analytical  or  inductive  method  and  the  prevailing  mathematical 
theory.  The  sole  founder  of  this  school  was  the  German  econo 
mist,  Wilhelm  Lexis,  whose  theories  were  expounded  in  a  series 
of  articles  and  monographs  published  between  the  years  1875 
and  1879.  For  some  years  Lexis's  fundamental  ideas  did  not 
attract  much  notice,  and  he  himself  seems  to  have  turned  his 
attention  in  other  directions.  But  more  recently  a  considerable 
literature  has  grown  up  round  them  in  Germany,  and  their  full 
purport  has  been  expressed  with  more  clearness  than  by  Lexis 
himself — although  no  .one,  with  the  exception  of  Ladislaus  von 
Bortkiewicz,  has  been  able  to  make  additions  to  them  of  any 
great  significance.1  Lexis  devised  his  theory  with  an  immediate 
view  to  its  practical  application  to  the  problems  of  sex  ratio  and 
mortality.  The  fact  that  his  general  theory  is  so  closely  inter 
mingled  with  these  particular  applications  of  it  is,  probably,  a 
part  explanation  of  the  long  interval  which  elapsed  before  the 
general  theoretical  importance  of  his  ideas  was  widely  realised. 
I  cannot  help  doubting  how  fully  Lexis  himself  realised  it  in  the 
first  instance.  Jt  would  certainly  be  easy  to  read  his  earlier 
contributions  to  the  question  without  appreciating  their  general 
ised  significance.  After  J879  Lexis  added  nothing  substantial  to 
his  earlier  work,  and  later  developments  are  mainly  clue  to  Von 

1  A  li.st  of  Lexis's  principal  writings  on  these  topics  will  be  found  in  the 
Bibliography.  There  is  little  of  first-rate  importance  which  is  not  contained 
either  in  the  volume,  Zur  Theorie  der  Md-ssenerscfieinungen  in  der  menachlichen 
Gesellschaft,  or  in  the  Abfuindlunyr.n  :ur  Theorie  der  Bet:olkernnys-  und  Moral- 
Statiatik.  In  this  latter  volume  the  two  important  articles  on  "  Die  Theorie  der 
Stahiliut  statistischer  Keihen "  and  on  "  Das  Gesohlechtbverhaltnia  dor 
Geborenen  und  die  Wahrscheinlichkeitsrechnung,"  originally  published  in  Con 
rad's  Juhrbiirhc,  are  reprinted. 


394  A  TREATISE  ON  PROBABILITY  PT.  v 

Bortkiewicz.  Those  of  the  latter's  writings,  which  have  an 
important  bearing  on  the  relation  between  probability  and 
statistics,  are  given  in  the  Bibliography.1 

On  the  logic  and  philosophy  of  Probability  writers  of  the 
school  of  Lexis  are  in  general  agreement  with  Von  Kries  ;  but  this 
seems  to  be  due  rather  to  the  reaction  which  is  common  both  to 
him  and  to  them  against  the  Laplacian  tradition,  than  to  any 
very  intimate  theoretical  connection  between  Von  Kries's  main 
contributions  to  Probability  and  those  of  Lexis,  though  it  is  true 
that  both  show  a  tendency  to  find  the  ultimate  basis  of  Probability 
in  physical  rather  than  in  logical  considerations.  I  am  not 
acquainted  with  much  work,  which  has  been  appreciably  influ 
enced  by  Lexis,  written  in  other  languages  than  German  (including 
with  Germans,  that  is  to  say,  those  Russians,  Austrians,  and  Dutch 
who  usually  write  in  German,  and  are  in  habitual  connection  with 
the  German  scientific  world).  In  France  Dormoy  2  published 
independently  and  at  about  the  same  time  as  Lexis  some  not 
dissimilar  theories,  but  subsequent  French  writers  have  paid 
little  attention  to  the  work  of  either.  Such  typical  French 
treatises  as  that  of  Bertrand,  or,  more  recently,  that  of  Borel, 
contain  no  reference  to  them.3  In  Italy  there  has  been  some 
discussion  recently  on  the  work  of  Von  Bortkiewicz.  Among 
Englishmen  Professor  Edgeworth  has  shown  a  close  acquaintance 
with  the  work  of  the  German  school,4  he  providing  for  nearly  forty 
years  past,  on  this  as  on  other  matters  where  the  realms  of 

1  The  reader  may  be  specially  referred  to  the  Kritische  Betrachtungen  zur 
theoretischen  Statistik  (first  instalment — the  later  instalments  being  of  less  interest 
to  the  student  of  Probability),  the  Anwendungen  der  Wahrscheinlichkeitsrechnung 
auf  Statistik,  and  Homogeneitdt  und  Stabilitdt  in  der  Statistik.     Of  other  German 
and  Russian  writers  it  will  be  sufficient  to  mention  here  Tschuprow,  who  in 
"  Die  Aufgaben  der  Theorie  der  Statistik  "  (Schmoller's  Jahrbuch,  1905)  and  "Zur 
Theorieder  Stabilitat  statistischer  Reihen  "  (Skandinavisk  Aktuarietidskrift)  gives 
by  far  the  best  and  most  lucid  general  accounts  that  are  available  of  the  doctrines 
of  the  school,  he  alone  amongst  these  authors  writing  in  a  style  from  which 
the  foreign  reader  can  derive  pleasure,  and  Czuber,  who  in  his  Wahrschein 
lichkeitsrechnung  (vol.   ii.  part  iv.   section  1)  supplies  a  useful  mathematical 
commentary. 

2  Journal  des  actuairesfranrais,  1874,  and  Theorie  mathematique  des  assurances 
siir  la  vie,  1878  ;  on  the  question  of  priority  see  Lexis,  Abhandlungen,  p.  130. 

3  Though  both  these  writers  touch  on  closely  cognate  matters,  where  Lexis's 
investigations  would  be  highly  relevant — Bertrand,  Calcul,  pp.  312-314;  Borel, 
Elemente,  p.  160. 

4  See  especially  his  "Methods  of  Statistics"  in  the  Jubilee   Volume  of  the 
Stat.   Journ.,    1885,   and    "Application   of   the   Calculus    of    Probabilities    to 
Statistics,"  International  Statistical  Institute  Bulletin,  1910. 


cm.  xxxii  STATISTICAL  INFERENCE  395 

Statistics  and  Probability  overlap,  almost  the  only  connecting 
link  between  English  and  continental  thought. 

Nevertheless,  an  account  in  English  of  the  main  doctrines  of 
this  school  is  still  lacking.  It  would  be  outside  the  plan  of  the 
present  treatise  to  attempt  such  an  account  here.  But  it  may 
be  useful  to  give  a  short  summary  of  Lexis's  fundamental  ideas. 
After  giving  this  account  I  shall  find  it  convenient,  in  proceeding 
to  my  own  incomplete  observations  on  the  matter,  to  approach 
it  from  a  rather  ditl'erent  standpoint  from  that  of  Lexis  or  ol 
Von  Bortkiewicz,  though  not  for  that  reason  the  less  influenced 
or  illuminated  by  their  eminent  contributions  to  this  problem. 

4.  It  will  be  clearer  to  begin  with  some  analysis  due  to  Von 
Bortkiewicz.1  and  then  to  proceed  to  the  method  of  Lexis  him 
self,  although  the  latter  came  first  in  point  of  time. 

A  group  of  observations  may  be  made  up  of  a  number  of  sub 
groups,  to  which  different  frequencies  for  the  character  under 
investigation  are  properly  applicable.  That  is  to  say,  a  propor 

tion   *  of  the  observations  may  belong  to  a  group,  for  which,  given 

the  frequency,  the  d  priori  probability  of  the  character  under 

z-i 
observation  in  a  particular  instance  would  be  plt  a  proportion 

may  belong  to  a  second  group  for  which  />.,  is  the  probability,  and 
so  on.  In  this  case,  given  the  frequencies  for  the  sub-groups. 
the  probability  ]>  for  the  group  as  a  whole  would  be  made  up  as 
follows  : 


We  may  call  p  a  gmcral  probability,  and  p,  etc.,  special  prob 
abilities.  But  the  special  probabilities  may  in  their  turn  be 
general  probabilities,  so  that  there  may  be  more  than  one  way 
of  resolving  a  general  probability  into  special  probabilities. 

If  pl  =p2  =  .  .  .  .  =p,  then  p,  for  that  particular  way  of  resolv 
ing  the  total  group  into  partial  groups,  is,  in  Bortkiewicz's  termin 
ology,  indifferent.  If  p  is  indifferent  for  all  conceivable  resolutions 
into  partial  groups,2  then,  borrowing  a  phrase  from  Von  Kries, 
Bortkiewiez  says  of  it  that  it  has  a  definitive  interrelation.  In 

1  What  follows  is  a  free  rendering  of  Home  passages  in  his  Kritixche 
Hf.trdrhtunyen. 

•  ThiH  is  clearly  a  very  loose  statement  of  what  Bortkiewicz  really  means. 
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dealing  with  d  priori  probabilities,  we  can  resolve  a  total  prob 
ability  until  we  reach  the  special  probabilities  of  each  individual 
case  ;  and  if  we  find  that  all  these  special  probabilities  are  equal, 
then,  clearly,  the  general  probability  satisfies  the  condition  for 
definitive  interpretation. 

So  far  we  have  been  dealing  with  d  priori  probabilities.  But 
the  object  of  the  analysis  has  been  to  throw  light  on  the  inverse 
problem.  We  want  to  discover  in  what  conditions  we  can  regard 
an  observed  frequency  as  being  an  adequate  approximation  to  a 
definitive  general  probability. 

If  pf  is  the  empirical  value  of  p  (or,  as  I  should  prefer  to  call 
it,  the  frequency)  given  by  a  series  of  n  observations,  we  may 
have 


Even  if  this  particular  way  of  resolving  the  series  of  observations 
is  indifferent,  the  actually  observed  frequencies  p-^^p^',  etc.,  may 
nevertheless  be  unequal,  since  they  may  fluctuate  round  the 
norm  p'  through  the  operation  of  '  chance  '  influences.  If, 
however,  nv  n2,  etc.,  are  large,  we  can  apply  the  usual  Bernoullian 
formula  to  discover  whether,  if  there  was  a  norm  p',  the  diverg 
ences  of  PI,  p2',  etc.,  from  it  are  within  the  limits  reasonably  attri 
butable  on  Bernoullian  hypotheses  to  '  chance  '  influences.  We 
can,  however,  only  base  a  sound  argument  in  favour  of  the 
existence  of  a  '  definitive  '  probability  p'  by  resolving  our 
aggregate  of  instances  into  sub-series  in  a  great  variety  of  ways, 
and  applying  the  above  calculations  each  time.  Even  so,  some 
measure  of  doubt  must  remain,  just  as  in  the  case  of  other 
inductive  arguments. 

Bortkiewicz  goes  on  to  say  that  probabilities  having  definitive 
interpretation  (definitive  Bedeutung)  may  be  designated  ele 
mentary  probabilities  (Elementarwahrscheinlichkeiten).  But  the 
probabilities  which  usually  arise  in  statistical  inquiries  are  not 
of  this  type,  and  may  be  termed  average  probabilities  (Durch- 
schnittswahrscheinlichkeiten).  That  is  to  say,  a  series  of  observed 
frequencies  (or,  as  he  calls  them,  empirical  probabilities)  does  not, 
as  a  rule,  group  itself  as  it  would  if  the  series  was  in  fact  subject 
to  an  elementary  probability. 

5.  This  exposition  is  based  on  a  philosophy  of  Probability 
different  from  mine  ;  but  the  underlying  ideas  are  capable  of 
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translation.  Suppose  that  one  is  endeavouring  to  establish  an 
inductive  correlation,  e.g.  that  the  chance  of  a  male  birth  is  in. 
The  conclusion,  which  we  are  seeking  to  establish,  takes  no 
account  of  the  place  or  date  of  birth  or  the  race  of  the  parents, 
and  assumes  that  these  influences  are  irrelevant.  Now,  if  we  had 
statistics  of  birth  ratios  for  all  parts  of  the  world  throughout  the 
nineteenth  century,  and  added  them  all  up  and  found  that  the 
average  frequency  of  male  births  was  m,  we  should  not  be  justified 
in  arguing  from  this  that  the  frequency  of  male  births  in  England 
next  year  is  very  unlikely  to  diverge  widely  from  m.  For  this 
would  involve  the  unwarranted  assumption,  in  Bortkiewicz's 
terminology,  that  the  empirical  probability  m  is  elementary  for 
any  resolution  dependent  on  time  or  place,  and  is  not  an  average 
probability  compounded  out  of  a  series  of  groups,  relating  to 
different  times  or  places,  to  each  of  which  a  distinct  special 
probabilitv  is  applicable.  And,  in  my  terminology,  it  would 
assume  that  variations  of  time  and  place  were  irrelevant  to  the 
correlation,  without  any  attempt  having  been  made  to  employ 
the  methods  of  positive  and  negative  Analogy  to  establish  this. 

We  must,  therefore,  break  up  our  statistical  material  into 
groups  by  date,  place,  and  any  other  characteristic  which  our 
generalisation  proposes  to  treat  as  irrelevant.  By  this  means 
we  shall  obtain  a  number  of  frequencies  IH^',  in.,',  w3',  ....  M/', 

w2",  w3" etc.,   which   are  distributed   round  the  average 

frequency  m.  For  simplicity  let  us  consider  the  series  of  fre 
quencies  ///j'.  m2',  w?3' obtained  by  breaking  up  our 

material  according  tu  the  date  of  the  birth.  If  the  observed 
divergences  of  these  frequencies  from  their  mean  are  not  signifi 
cant,  we  have  the  beginnings  of  an  inductive  argument  for 
regarding  date  as  being  in  this  connection  irrelevant. 

6.  At  this  point  Lexis's  fundamental  contribution  to  the 
problem  must  be  introduced.  He  concentrated  his  attention  on 
the  nature  of  the  dispersion  of  the  frequencies  m^,  tnz',  »?3'  .... 
round  their  mean  value  m  ;  and  he  sought  to  devise  a  technical 
method  for  measuring  the  degree  of  stability  displayed  by  the 
series  of  sub-frequencies,  which  are  yielded  by  the  various  possible 
criteria  for  resolving  the  aggregate  statistical  material  into  a 
number  of  constituent  groups. 

For  this  purpose  he  classified  the  various  types  of  dispersion 
which  could  occur.  It  may  be  the  case  that  some  of  the  sub- 
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frequencies  show  such  wide  and  discordant  variations  from  the 
mean  as  to  suggest  that  some  significant  Analogy  has  been  over 
looked.  In  this  event  the  lack  of  symmetry,  which  characterises 
the  oscillations,  may  be  taken  to  indicate  that  some  of  the  sub 
groups  are  subject  to  a  relevant  influence,  of  which  we  must  take 
account  in  our  generalisation,  to  which  some  of  the  other  sub 
groups  are  not  subject. 

But  amongst  the  various  types  of  dispersion  Lexis  found  one 
class  clearly  distinguishable  from  all  the  others,  the  peculiarity 
of  which  is  that  the  individual  values  fluctuate  in  a  '  purely 
chance  '  manner  about  a  constant  fundamental  value.  This 
type  he  called  typical  (typische)  dispersion.  He  meant  by  this 
that  the  dispersion  conformed  approximately  to  the  distribution 
which  would  be  given  by  some  normal  law  of  error. 

The  next  stage  of  Lexis's  argument  1  was  to  point  out  that 
series  of  frequencies  which  are  typical  in  character  may  have  as 
their  foundation  either  a  constant  probability,2  or  one  which  is 
itself  subject  to  chance  variations  about  a  mean.  The  first  case 
is  typified  by  the  example  of  a  series  of  sets  of  drawings  of  balls, 
each  set  being  drawn  from  a  similar  urn  ;  the  second  case  by  the 
example  of  a  series  of  sets  of  drawings,  the  urns  from  which  each 
set  is  drawn  being  not  similar,  but  with  constitutions  which  vary 
in  a  chance  manner  about  a  mean. 

As  his  measure  of  dispersion  Lexis  introduces  a  formula,  which 
is  evidently  in  part  conventional  (as  is  the  case  with  so  many 
other  statistical  formulae,  the  particular  shape  of  which  is  often 
determined  by  mathematical  convenience  rather  than  by  any 
more  fundamental  criterion).  He  expresses  himself  as  follows. 
Where  the  underlying  probability  is  constant,  the  probable  error 


.     .   . 
in   a   particular    frequency   a  priori   is  r  =  p    I  ,    where 

p  =  -4769,  v  is  the  underlying  probability,  and  g  is  the  number  of 
instances  to  which  the  frequency  refers.  This  follows  from  the 
usual  Bernoullian  assumptions.  Now  let  R  be  the  corresponding 
expression  derived  d  posteriori,  by  reference  to  the  actual  devia 
tions  of  a  series  of  observed  frequencies  from  their  mean,  so  that 

1  T  am  lion"  following  fairly  closely  his  paper,  "  Uber  die  Theorie  der  Stabilitat 
statisticher  Keihen,"  reprinted  in  his  Abhandlungen  ziir  Theorie  der  Bevolkerungs- 
and  Moral-Statistik,  pp.  170-212. 

2  This  mode  of  expression,  which  is  not  in  accurate  conformity  with  my 
philosophy  of  Probability,  is  Lexis's,  not  mine.     His  meaning  is  intelligible. 
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— ,  where  [S2]  is  the  sum  of  the  squares  of  the  devia 
tions  of  the  individual  frequencies  from  their  mean  and  n  is  their 
number.  Now,  if  the  observed  facts  are  due  to  merely  chance 
variations  about  a  constant  v,  we  must  have  approximately 
R=r,  though,  if  //  is  small,  comparatively  wide  deviations  be 
tween  R  and  r  will  not  be  significant.  If,  on  the  other  hand,  v 
itself  is  not  constant  but  is  subject  to  chance  variations,  the  case 
stands  differently.  For  the  fluctuations  of  the  observed  fre 
quencies  are  now  due  to  two  components.  The  one  wrhich  would 
be  present,  even  if  the  underlying  probability  were  constant, 
Lexis  terms  the  ordinary  or  unessential  component  ;  the  other 
he  terms  the  physical  component.  If  p  is  the  probable  deviation 
of  the  various  values  of  v  from  their  mean,  then,  on  the  same 
assumptions  and  as  a  deduction  from  the  same  theory  as  before, 
R  will  tend  to  equal  not  r  but  ^/r2  +  p2.  In  this  event  R  cannot 
be  less  than  r.  If,  therefore,  K< r,  one  must  suppose  that  the 
individual  instances  of  each  several  series  on  which  each  frequency 
is  based  are  not  independent  of  one  another.  Such  a  series 
Lexis  terms  an  organic  or  dependent  (gebundene)  series,  and 
explains  that  it  cannot  be  handled  by  purely  statistical  methods. 
Since,  therefore,  we  have  three  types  of  series,  differing 
fundamentally  from  one  another  according  as  R=r,  >r,  or  <rr. 

R 

Lexis  puts       =  Q,  and  takes  Q  as  his  measure  of  dispersion.1     If 

Q  =  1 ,  we  have  normal  dispersion  ;  if  Q  >  1 ,  we  have  supernormal 
dispersion;  and  if  Q<  1,  we  have  subnormal  dispersion,  which  is 
an  indication  that  the  series  is  '  organic.' 

If  the  number  of  instances  on  which  the  frequencies  are  based 
is  very  great,  r  becomes  negligible  in  comparison  with  p  (the 
physical  component),  and,  therefore,  ~R=^/r'2+jr  becomes 
approximately  R=p.  On  the  other  hand,  if  p  is  not  very  large 
and  the  base  number  of  instances  is  small,  p  becomes  negligible 

1    In  Tsehujirow'a  notation  (Die  Avfgaben  der  Thenrie.  dcr  Stutistik,  |>.  4.")), 


Vk      n 
2Z   (p,     />)•- 
and  C 


Q  -P/C,  where  P  (the  Physical  modulus)  =\J    *  and  C  (the  Com 

binatorial  modulus)   •--*/         ^j        ,  M  being  tho  number  of  instances  in  each 

wet,  n  the  number  of  sets,  />;  the  frequency  for  set  k,  and  p  the  mean  of  the 
n  frequencies. 


400  A  TREATISE  ON  PROBABILITY  PT.  v 

in  comparison  with  r,  and  we  have  a  delusive  appearance  of 
normal  dispersion.1  Lexis  well  illustrates  the  former  point  by 
the  example  that  the  statistics  of  the  ratio  of  male  to  female 
births  for  the  forty-five  registration  districts  of  England  over  the 
years  1859-1871  approximately  satisfy  the  relation  R=r.  But 
if  we  take  the  figures  for  all  England  over  those  thirteen  years, 
although  the  extreme  limits  of  the  fluctuation  of  the  ratio  about 
its  mean  1-042  are  1-035  and  1-047,  nevertheless  R  =  2-6  and  r  =  1-6, 
so  that  Q  =  1-625  ;  the  explanation  being  that  the  base  number 
of  instances,  namely  730,000,  is  so  large  that  r  is  very  small,  with 
the  result  that  it  is  swamped  by  the  physical  component  p.  And 
he  illustrates  the  latter  point  by  the  assertion  that,  if  in  20  or  30 
series  each  of  100  draws  from  an  urn  containing  black  and  white 
balls  equally,  the  number  of  black  balls  drawn  each  time  were 
only  to  vary  between  49  and  51,  he  would  have  confidence  that 
the  game  was  in  some  way  falsified  and  that  the  draws  were  not 
independent.  That  is  to  say,  undue  regularity  is  as  fatal  to  the 
assumption  of  Bernoullian  conditions  as  is  undue  dispersion. 

7.  In  a  characteristic  passage  2  Professor  Edgewrorth  has  applied 
these  theories  to  the  frequency  of  dactyls  in  successive  extracts 
from  the  Aeneid.  The  mean  for  the  line  is  1-6,  exclusive  of  the 
fifth  foot,  thus  sharply  distinguishing  the  Virgilian  line  from  the 
Ovidian,  for  which  the  corresponding  figure  is  2-2.  But  there  is 
also  a  marked  stability.  "  That  the  Mean  of  any  five  lines 
should  differ  from  the  general  Mean' by  a  whole  dactyl  Ls  proved 
to  be  an  exceptional  phenomenon,  about  as  rare  as  an  Englishman 
measuring  5  feet,  or  6  feet  3  inches.  An  excess  of  two  dactyls 
in  the  Mean  of  five  lines  would  be  as  exceptional  as  an  Englishman 
measuring  6  feet  10  inches."  But  not  only  so — the  stability  is 
excessive,  and  the  fluctuation  is  less  "  than  that  which  is  obtained 
upon  the  hypothesis  of  pure  sortition.  If  we  could  imagine 
dactyls  and  spondees  to  be  mixed  up  in  the  poet's  brain  in  the 
proportion  of  16  to  24  and  shaken  out  at  random,  the  modulus 
in  the  number  of  dactyls  would  be  1-38,  whereas  we  have  con 
stantly  obtained  a  smaller  number,  on  an  average  (the  square 
root  of  the  average  fluctuation)  1-2."  On  Lexian  principles 
these  statistical  results  would  support  the  hypothesis  that  the 

1  This  is  part  of  the  explanation  of  Bortkiewicz's  Law  of  Small  Numbers. 
Sor  also  p.  401. 

2  "  On  Methods  of  Statistics,"  Jubilee  Volume  of  the  Royal  Statistical  Society, 
p.  21 J. 
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series  under  investigation  is  '  organic  '  and  not  subject  to 
Bernoullian  conditions,  an  hypothesis  in  accordance  witli  our 
ideas  of  poetry.  That  Edgeworth  should  have  put  forward 
this  example  in  criticism  of  Lexis's  conclusions,  and  that  Lexis  l 
should  have  retorted  that  the  explanation  was  to  be  found  in 
Edgeworth's  series'  not  consisting  of  an  adequate  number  of 
separate  observations,  indicates,  if  I  do  not  misapprehend  them, 
that  these  authorities  are  at  fault  in  the  principles,  if  not  of 
Probability,  of  Poetry. 

The  dactyls  of  the  Virgilian  hexameter  are,  in  fact,  a  very 
good  example  of  what  has  been  termed  connexite,  leading  to  sub 
normal  dispersion.  The  quantities  of  the  successive  feet  are  not 
independent,  and  the  appearance  of  a  dactyl  in  one  foot  diminishes 
the  probability  of  another  dactyl  in  that  line.  It  is  like  the  case 
of  drawing  black  and  white  balls  out  of  an  urn,  where  the  balls 
are  not  replaced.  But  Lexis  is  wrong  if  he  supposes  that  a  super 
normal  dispersion  cannot  also  arise  out  of  connexite,  or  organic 
connection  between  the  successive  terms.  It  might  have  been 
the  case  that  the  appearance  of  a  dactyl  in  one  foot  increased 
the  probability  of  another  dactyl  in  that  line,  lie  should,  1 
think,  have  contemplated  the  result  R>r  as  possibly  indicating 
a  non-typical,  organic  series,  and  should  not  have  assumed  that, 
where  R  is  greater  than  r,  it  is  of  the  form  *J  r- +  p-. 

In  short.  Lexis  has  not  pushed  his  analysis  far  enough,  and  he 
has  not  fully  comprehended  the  character  of  the  underlying 
conditions.  But  this  does  not  affect  the  fact  that  it  was  he  who 
made  the  vital  advance  of  taking  as  the  unit,  not  the  single 
observation,  but  the  frequency  in  given  conditions,  and  of  con 
ceiving  the  nature  of  statistical  induction  as  consisting  in  the 
examination,  and  if  possible  the  measurement,  of  the  stability 
of  the  frequency  when  the  conditions  are  varied. 

8.  There  is  one  special  piece  of  work  illustrative  of  the  above 
methods,  due  to  Von  Bortkiewicz,  which  must  not  be  overlooked, 
and  which  it  is  convenient  to  introduce  in  this  place  the  so- 
called  Law  of  SnHill  Numlwrs.2 

Quetelet.  as  we  have  seen  in  Chapter  XX  VI II.,  called  attention 

"  Lber  die  Wahrscheinlichkeitarcchnung,"  p.  444  (see  Bibliography). 
2  There  an-  numerous  references  to  this  phenomenon  in  periodical  literature  ; 
hut  it  iw  Hiifliri.'iit  to  refer  the  reader  to  Von  Bortkiewicz'a  Daa  (Jenctz  der  kleinen 
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to  the  remarkable  regularity  of  comparatively  rare  events.  Von 
Bortkiewicz  has  enlarged  Quetelet's  catalogue  with  modem 
instances  out  of  the  statistical  records  of  bureaucratic  Germany. 
The  classic  instance,  perhaps,  is  the  number  of  Prussian  cavalry 
men  killed  each  year  by  the  kick  of  a  horse.  The  table  is  worth 
giving  as  a  statistical  curiosity.  (The  period  is  from  1875  to 
1894  ;  G  stands  for  the  Corps  of  Guards,  and  I. -XV.  for  the 
15  Army  Corps.) 


75 

76     77     78 

79     80     81 

82 

83 

84     85     86 

87     83     89     90 

91     92 

93 

94 

G.    .. 
I 

2      2.1 

2 

..      ..      1 
..      3     .. 

1 
2 

3     ..     2 
....      1 

i    ..   ..  !  i 

1      1      ..     2 

i 

..     3 

i 

1 

II 

2 

..      2     .. 

1 

1      .... 

211.. 

..12 

I 

112 

2 

1     ..      1      2 

TV 

i             i 

1      11 

1 

1 

v 

2      1      .. 

1 

....      1 

..111 

1 

1 

VI.     .  .  j 
VII.     1 
VIII      1 

.1     .. 

.  !  i    .. 

2      ..     .. 
..     ..      1 
1      .... 

1 
1 

2 
1 

..  !  i    i 

i  j  ..   .. 

3111 

2  j  ..     ..     2 
1      

..  |  3 
1 

2 

1 

IX 

..  i  2      1 

1 

1 

1  2      1 

1      ..      1      2 

. 

i    i 

1     .. 

9 

2  !..     .. 

....      2      1 

3     .. 

1 

1 

XI 

2      4     .. 

1 

3 

..11 

1121 

3      1 

3 

1 

XIV.     1 
YV 

1      2      1 
1 

113.. 

^ 
1 

1      ..     3 
1      1     .. 

2      1      ..     2 

2  '  2 

1      1 

•• 

•• 

i              1 

The  agreement  of  this  table  with  the  theoretical  results  of  a 
random  distribution  of  the  total  number  of  casualties  is  remark 
ably  close  :  l 


Casualties  in  a 
Year. 

lumber  of  Occasions  on  which  the  Annual 
Casualties  in  a  Corps  reach  the  Figure 
in  Column  1. 

Actual. 

Theoretical. 

0 

144 

143-1 

1 

91 

92-1 

2 

32 

33-3 

3 

11 

8-9 

4 

2 

2-0 

5  and  more 

•• 

0-0 

Other  instances  are  furnished  by  the  numbers  of  child  suicides 
in  Prussia,  and  the  like. 

It  is  Von  Bortkiewicz's  thesis  that  these  observed  regularities 

1  Bortkiewicz,  op.  cit.  p.  24. 
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have   a   good    theoretical   explanation    behind    them,  which  he 
dignifies  with  the  name  of  the  Law  of  Small  Numbers. 

The  reader  will  recall  that,  according  to  the  theory  of  Lexis 
his  measure  of  stability  Q  13,  in  the  more  general  case,  made  up 
of  two  components  /•  and  ;,,  combined  in  the  expression  v//^  +  ^ 
of  which  one  is  due  to  fluctuations  from  the  average  of  the  con 
ditions  governing  all  the  members  of  a  series,  which  furnishes  us 
with  one  of  our  observed  frequencies,  and  of  which  the  other  is 
due  to  fluctuations  in  the  individual  members  of  the  series  about 
the  true  norm  of  the  series.  Bortkiewicz  carries  the  same 
analysis  a  little  further,  and  shows  that  Lexis's  Q  is  of  the  form 
v/1  +(w-l)c2,  where  n  is  the  number  of  times  that  the  event 
occurs  in  each  series.*  That  is  to  say,  Q  increases  with  n  and 
when  n  is  small,  Q  is  likely  to  exceed  unity  to  a  less  extent  than 
when  n  is  large.  To  postulate  that  n  is  small,  is,  when  we  are 
dealing  with  observations  drawn  from  a  wide  field,  the  same 
thing  as  to  say  that  the  event  we  are  looking  for  is  a  comparatively 
rare  one.  This,  in  brief,  is  the  mathematical  basis  of  the  Law 
of  Small  Numbers. 

In  his  latest  published  work  on  these  topics,'  Von  Bortkiewicz 

builds  his  mathematical  .structure  considerably  higher,  without 

however    any   further   underpinning  of  the  logical  foundations 

He  has  there  worked  out  further  statistical  constants 

arising  out  of  the  conceptions  on  which  Lexis's  Q  is  based  (the 

precise  bearing  of  which  is  not  made  any  clearer  by  his  calling 

them  coefficients  of  syndromy),  which  are   explicitly  dependent 

the  value  of  n  ;  and  he  elaborately  compares  the  theoretical 

value  of  the  coefficients  with  the  observed  value  in  certain  actual 

statistical  material.     He  concludes  with  the  thesis,  that  Homo 

geneity  and  Stability  (defined  as  he  defines  them)  are  opposed 

conceptions,  and  that  it  is  not  correct  to  premise,  that  the  larger 

bical  mass  is  as  a  nil,,  more  stable  than  the  smaller,  unless 

1  refer  the  reader  to  the  orioinal   n     rit   T> 


na  m,,  ,  - 


404  A  TEEATISE  ON  PROBABILITY  FT.  v 

we  also  assume  that  the  larger  mass  is  less  homogeneous.  At  this 
point,  it  would  have  helped,  if  Von  Bortkiewicz,  excluding  from 
his  vocabulary  homogeneity,  paradromy,  j'M,  and  the  like,  had 
stopped  to  tell  in  plain  language  where  his  mathematics  had  led 
him,  and  also  whence  they  had  started.  But  like  many  other 
students  of  Probability  he  is  eccentric,  preferring  algebra  to  earth. 
9.  Where,  then,  though  an  admirer,  do  I  criticise  all  this  ?  I 
think  that  the  argument  has  proceeded  so  far  from  the  premisses, 
that  it  has  lost  sight  of  them.  If  the  limitations  prescribed  by 
the  premisses  are  kept  in  mind,  I  do  not  contest  the  mathematical 
accuracy  of  the  results.  But  many  technical  terms  have  been 
introduced,  the  precise  signification  and  true  limitations  of  which 
will  be  misunderstood  if  the  conclusion  of  the  argument  is  allowed 
to  detach  itself  from  the  premisses  and  to  stand  by  itself.  I  will 
illustrate  what  I  mean  by  two  examples  from  the  work  of  Von 
Bortkiewicz  described  above. 

Von  Bortkiewicz  enunciates  the  seeming  paradox  that  the 
larger  statistical  mass  is  only,  as  a  rule,  more  stable  if  it  is  less 
homogeneous.  But  an  illustration  which  he  himself  gives  shows 
how  misleading  his  aphorism  is.  The  opposition  between 
stability  and  homogeneity  is  borne  out,  he  says,  by  the  judgment 
of  practical  men.  For  actuaries  have  always  maintained  that 
their  results  average  out  better,  if  their  cases  are  drawn  from  a 
wide  field  subject  to  variable  conditions  of  risk,  whilst  they  are 
chary  of  accepting  too  much  insurance  drawn  from  a  single 
homogeneous  area  which  means  a  concentration  of  risk.  But 
this  is  really  an  instance  of  Von  Bortkiewicz's  own  distinction 
between  a  general  probability  p  and  special  probabilities  pl  etc., 
where 


If  we  are  basing  our  calculations  on  p  and  do  not  know  plt  p2, 
etc.,  then  these  calculations  are  more  likely  to  be  borne  out  by* 
the  result  if  the  instances  are  selected  by  a  method  which  spreads 
them  over  all  the  groups  1,  2,  etc.,  than  if  they  are  selected  by  a 
method  which  concentrates  them  on  group  1.  In  other  words, 
the  actuary  does  not  like  an  undue  proportion  of  his  cases  to  be 
drawn  from  a  group  which  may  be  subject  to  a  common  relevant 
influence  for  which  he  has  not  allowed.  If  the  a  priori  calculations 
are  based  on  the  average  over  a  field  which  is  not  homogeneous 
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in  all  its  parts,  greater  stability  of  result  will  be  obtained  if  the 
instanees  are  drawn  from  all  parts  of  the  non-homogeneous 
total  field,  than  if  they  are  drawn  now  from  one  homogeneous 
sub-field  and  now  from  another.  This  is  not  at  all  paradoxical. 
Yet  I  believe,  though  with  hesitation,  that  this  is  all  that  Von 
Bortkiewicz's  elaborately  supported  mathematical  conclusion 
really  amounts  to. 

My  second  example  is  that  of  the  Law  of  Small  Numbers. 
Here  also  we  are  presented  with  an  apparent  paradox  in  the 
statement  that  the  regularity  of  occurrence  of  rare  events  is  more 
stable  than  that  of  commoner  events.  Here,  I  suspect,  the 
paradoxical  result  is  really  latent  in  the  particular  measure  of 
stability  which  has  been  selected.  If  we  look  back  at  the  figures, 
which  I  have  quoted  above,  of  Prussian  cavalrymen  killed  by 
the  kick  of  a  horse,  it  is  evident  that  a  measure  of  stability  could 
be  chosen  according  to  which  exceptional  instability  would  be 
displayed  by  this  particular  material  ;  for  the  frequency  varies 
from  0  to  4  round  a  mean  somewhat  less  than  unity,  which  is  a 
very  great  percentage  fluctuation.  In  fact,  the  particular  measure 
of  stability  which  Von  Bortkiewicz  has  adopted  from  Lexis  has 
about  it,  however  useful  and  convenient  it  may  be,  especially  for 
mathematical  manipulation,  a  great  deal  that  is  arbitrary  and 
conventional.  It  is  only  one  out  of  a  great  many  possible 
formulae  which  might  be  employed  for  the  numerical  measure 
ment  of  the  conception  of  stability,  which,  quantitatively  at 
least,  is  not  a  perfectly  precise  one.  The  so-called  Law  of  Small 
Numbers  is,  therefore,  little  more  than  a  demonstration  that, 
where  rare  events  are  concerned,  the  Lexian  measure  of  stability 
does  not  lead  to  satisfactory  results.  Like  some  other  formulae 
which  involve  a  use  of  Bernoullian  methods  in  an  approximative 
form,  it  does  not  lead  to  reliable  results  in  all  circumstances. 
I  should  add  that  there  is  one  other  element  which  may  contribute 
to  the  total  psychological  reaction  of  the  reader's  mind  to  the 
Law  of  Small  Numbers,  namely,  the  surprising  and  piquant 
examples  which  are  cited  in  support  of  it.  It  is  startling  and 
even  amusing  to  be  told  that  horses  kick  cavalrymen  with  the 
same  sort  of  regularity  as  characterises  the  rainfall.  Bui  our 
surprise  at  this  particular  example's  fulfilling  the  Law  of  Great 
Numbers  has  little  or  nothing  to  do  with  the  exceptional  stability 
about  which  the  Law  of  Small  Numbers  purports  to  concern  itself. 


CHAPTER  XXXIII 

OUTLINE    OF   A    CONSTRUCTIVE   THEORY 

1.  THERE  is  a  great  difference  between  the  proposition  "  It  is 
probable  that  every  instance  of  this  generalisation  is  true  "  and 
the  proposition  "  It  is  probable  of  any  instance  of  this  generalisa 
tion  taken  at  random  that  it  is  true."  The  latter  proposition 
may  remain  valid,  even  if  it  is  certain  that  some  instances  of  the 
generalisation  are  false.  It  is  more  likely  than  not,  for  example, 
that  any  number  will  be  divisible  either  by  two  or  by  three,  but 
it  is  not  more  likely  than  not  that  all  numbers  are  divisible  either 
by  two  or  by  three. 

The  first  type  of  proposition  has  been  discussed  in  Part  III. 
under  the  name  of  Universal  Induction.  The  latter  belongs  to 
Inductive  Correlation  or  Statistical  Induction,  an  attempt  at  the 
logical  analysis  of  which  must  be  my  final  task. 

2.  What  advocates  of  the  Frequency  Theory  of  Probability 
wrongly  believe  to  be  characteristic  of  all  probabilities,  namely, 
that  they  are  essentially  concerned  not  with  single  instances  but 
with  series  of  instances,  is,  I  think,  a  true  characteristic  of 
statistical  induction.  A  statistical  induction  either  asserts  the 
probability  of  an  instance  selected  at  random  from  a  series  of 
propositions,  or  else  it  assigns  the  probability  of  the  assertion, 
that  the  truth  frequency  of  a  series  of  propositions  (i.e.  the 
proportion  of  true  propositions  in  the  series)  is  in  the  neighbour 
hood  of  a  given  value.  In  either  case  it  is  asserting  a  char 
acteristic  of  a  series  of  propositions,  rather  than  of  a  particular 
proposition. 

Whilst,  therefore,  our  unit  in  the  case  of  Universal  Induction 
is  a  single  instance  which  satisfies  both  the  condition  and  the 
conclusion  of  our  generalisation,  our  unit  in  the  case  of  Statistical 
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Induction  is  not  a  single  instance,  but  a  set  or  series  of  instances, 
all  of  which  satisfy  the  condition  of  our  generalisation  but 
which  satisfy  the  conclusion  only  in  a  certain  proportion  of  cases. 
And  whilst  in  Universal  Induction  we  build  up  our  argument  by 
examining  the  known  positive  and  negative  Analogy  shown  in  a 
series  of  single  instances,  the  corresponding  task  in  Statistical 
Induction  consists  in  examining  the  Analogy  shown  in  a  series  of 
series  of  instances. 

3.  We  are  presented,  in  problems  of  Statistical  Induction,  with 
a  set  of  instances  all  of  which  satisfy  the  conditions  of  our  general 
isation,  and  a  proportion  /  of  which  satisfy  its  conclusion  ;  and 
we  seek  to  generalise  as  to  the  probable  proportion  in  which 
further  instances  will  satisfy  the  conclusion. 

Now  it  is  useless  merely  to  pay  attention  to  the  proportion  (or 
frequency)  /  discovered  in  the  aggregate  of  the  instances.  For 
any  collection  whatever,  comprising  a  definite  number  of  objects, 
must,  if  the  objects  be  classified  with  reference  to  the  presence 
or  absence  of  any  specified  characteristic  whatever,  show  some 
definite  proportion  or  statistical  frequency  of  occurrence  ;  so  that 
a  mere  knowledge  of  what  this  frequency  is  can  have  no  appreci 
able  bearing  on  what  the  corresponding  frequency  will  be  for 
some  other  collection  of  objects,  or  on  the  probability  of  finding 
the  characteristic  in  an  object  which  does  not  belong  to  the 
original  collection.  We  should  be  arguing  in  the  same  sort  of 
way  as  if  we  were  to  base  a  universal  induction  as  to  the 
concurrence  of  two  characteristics  on  a  single  observation  of  this 
concurrence,  and  without  any  analysis  of  the  accompanying 
circumstances. 

Let  the  reader  be  clear  about  this.  To  argue  from  the  mere 
fact  that  a  given  event  has  occurred  invariably  in  a  thousand 
instances  under  observation,  without  any  analysis  of  the  circum 
stances  accompanying  the  individual  instances,  that  it  is  likely 
to  occur  invariably  in  future  instances,  is  a  feeble  inductive 
argument,  because  it  takes  no  account  of  the  Analogy.  Neverthe 
less  an  argument  of  this  kind  is  not  entirely  worthless,  as  we  have 
seen  in  Part  1 1  f.  But  to  argue,  without  analysis  of  the  instances, 
from  the  mere  fact  that  a  given  event  has  a  frequency  of  10  per 
cent  in  the  thousand  instances  under  observation,  or  even  in  a 
million  instances,  that  its  probability  is  1/10  for  the  next  instance, 
or  that  it  is  likely  to  have  a  frequency  near  to  1/10  in  a  further 
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set  of  observations,  is  a  far  feebler  argument ;  indeed  it  is  hardly 
an  argument  at  all.  Yet  a  good  deal  of  statistical  argument  is  not 
free  from  this  reproach  ; — though  persons  of  common  sense  often 
conclude  better  than  they  argue,  that  is  to  say,  they  select  for 
credence,  from  amongst  arguments  similar  in  form,  those  in 
favour  of  which  there  is  in  fact  other  evidence  tacitly  known  to 
them  though  not  explicit  in  the  premisses  as  stated. 

4.  The  analysis  of  statistical  induction  is  not  fundamentally 
different  from  that  of  universal  induction  already  attempted  in 
Part  III.  But  it  is  much  more  intricate  ;  and  I  have  experienced 
exceptional  difficulty,  as  the  reader  may  discover  for  himself  in 
the  following  pages,  both  in  clearing  up  my  own  mind  about  it 
and  in  expounding  my  conclusions  precisely  and  intelligibly.  I 
propose  to  begin  with  a  few  examples  of  what  commonly  impresses 
us  as  good  arguments  in  this  field,  and  also  of  the  attendant 
circumstances  which,  if  they  were  known  to  exist,  might  be  held 
to  justify  such  a  mode  of  reasoning  ;  and,  having  thus  attempted 
to  bring  before  the  reader's  mind  the  character  of  the  subject- 
matter,  to  proceed  to  an  abstract  analysis. 

Example  One. — Let  us  investigate  the  generalisation  that  the 
proportion  of  male  to  female  births  is  m.  The  fact  that  the 
aggregate  statistics  for  England  during  the  nineteenth  century 
yield  the  proper fcion  m  wTould  go  no  wray  at  all  towards  justifying 
the  statement  that  the  proportion  of  male  births  in  Cambridge 
next  year  is  likely  to  approximate  to  m.  Our  argument  would 
be  no  better  if  our  statistics,  instead  of  relating  to  England  during 
the  nineteenth  century,  covered  all  the  descendants  of  Adam. 
But  if  we  were  able  to  break  up  our  aggregate  series  of  instances 
into  a  series  of  sub-series,  classified  according  to  a  great  variety 
of  principles,  as  for  example  by  date,  by  season,  by  locality,  by 
the  class  of  the  parents,  by  the  sex  of  previous  children,  and  so 
forth,  and  if  the  proportion  of  male  births  throughout  these  sub- 
series  showed  a  significant  stability  in  the  neighbourhood  of  m, 
then  indeed  we  have  an  argument  worth  something.  Otherwise 
we  must  either  abandon  our  generalisation,  amplify  its  conditions, 
or  modify  its  conclusion. 

Example  Two. — Let  us  take  a  series  of  objects  s  all  alike  in 
some  specified  respect,  this  resemblance  constituting  membership 
of  the  class  F  ;  let  us  determine  of  how  many  members  of  the 
series  a  certain  property  </>  is  true,  the  frequency  of  which  is  to  be 
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the  subject  of  our  generalisation  :  ami  if  a  proportion  /  of  the 
series  s  have  the  property  <£,  we  inuy  say  that  the  series  s  has  a 
frequency  /  for  the  property  </>. 

Now  if  the  whole  held  F  has  a  finite  number  of  constituents, 
it  must  have  some  determinate  frequency  p,  and  if,  therefore, 
we  increase  the  comprehensiveness  of  s  until  eventually  it 
includes  the  whole  field.  /  must  come  in  the  end  to  be  equal 
to  p.  This  is  obvious  and  without  interest  and  not  what  we 
mean  by  the  law  of  great  numbers  and  the  stability  of  statistical 
frequency. 

Let  us  now  divide  up  the  field  F,  according  to  some  deter 
minate  principle  of  division  D,  into  subfields  F15  F2,  etc.  ;  and 
let  the  series  .s-j  be  taken  from  FI}  s2  from  F2,  and  so  on.  Where 
F1?  Fo.  etc.,  have  a  finite  number  of  constituents,  slt  s2,  etc.,  may 
possibly  coincide  with  them  :  if  slt  62,  etc.,  do  not  coincide  with 
Fj,  F2,  etc.,  but  are  chosen  from  them,  let  us  suppose  that  they  are 
chosen  according  to  some  principle  of  random  or  unbiassed 
selection — Sj,  that  is  to  say,  will  be  a  random  sample  from  Fj. 
Now  it  may  happen  that  the  frequencies  /1?/2,  etc.,  of  the  series 
slt  s2,  etc.,  thus  selected  cluster  round  some  mean  frequency  /.  If 
the  frequencies  show  this  characteristic  (the  measurement  and  pre 
cise  determination  of  which  I  am  not  now  considering),  then  the 
series  of  series  Sj,  sz.  etc.,  has  a  stable  frequency  for  the  classifica 
tion  D.  '  Great  numbers  '  only  come  in  because  it  is  difficult  to 
ascertain  the  existence  of  stable  frequency  unless  the  series  6^,  j?2, 
etc.,  are  themselves  numerous  and  unless  each  of  these  comprises 
numerous  individual  instances. 

Let  us  then  apply  a  different  principle  of  division  I)',  leading 
to  series  .s/,  .v/,  etc..  and  to  frequencies//,/.,', etc. ;  and  then  again 
a  third  principle  of  division  D"  leading  to  frequencies //',/>",  etc.  ; 
and  so  on,  to  the  full  extent  that  our  knowledge  of  the  differences 
between  the  individual  instances  permits  us.  If  the  frequencies 
/,,/,,  etc.,//,//,  <'tc. ,//',//',  etc.,  and  so  on  are  all  stable  about/, 
we  have  an  inductive  ground  of  some  weight  for  asserting  a 
statistical  generalisation. 

Let  the  field  F,  for  example,  comprise  all  Englishmen  in  their 
sixtieth  year,  and  let  the  property  <£,  about  the  frequency  of 
which  we  an;  generalising,  be  their  death  in  that  year  of  their  age. 
Now  the  field  F  can  be  divided  into  subfields  F1?  F2,  etc.,  on  in 
numerable  different  principles.  Fj  might,  represent  Englishmen 
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in  their  sixtieth  year  in  1901,  F2  in  1902,  and  so  on  ;  or  we  might 
classify  them  according  to  the  districts  in  which  they  live  ;  or 
according  to  the  amount  of  income  tax  they  pay  ;  or  according  as 
they  are  in  workhouses,  in  hospitals,  in  asylums,  in  prisons,  or  at 
large.  Let  us  take  the  second  of  these  classifications  and  let  the 
subfields  FI}  F2,  etc.,  be  constituted  by  the  districts  in  which  they 
live.  If  we  take  large  random  selections  sl9  s2,  etc.,  from  F15  F2, 
etc.,  respectively,  and  find  that  the  frequencies /l5/2,  etc.,  fluctuate 
closely  round  a  mean  value  /,  this  can  be  expressed  by  the 
statement  that  there  is  a  stable  frequency  f  for  death  in  the 
sixtieth  year  in  different  English  districts.  We  might  also  find 
a  similar  stability  for  all  the  other  classifications.  On  the  other 
hand,  for  the  third  and  fourth  classifications  we  might  find  no 
stability  at  all,  and  for  the  first  a  greater  or  less  degree  of  stability 
than  for  the  second.  In  the  latter  case  the  form  of  our  statistical 
generalisation  must  be  modified  or  the  argument  in  its  favour 
weakened. 

Example  Three. — Let  us  return  to  the  example  given  in  Chapter 
XXVII.  of  the  dog  w7hich  is  fed  sometimes  by  scraps  at  table 
and  so  judges  it  reasonable  to  be  there.  From  one  year  to  another, 
let  us  assume,  the  dog  gets  scraps  on  a  proportion  of  days  more 
or  less  stable.  What  sorts  of  explanation  might  there  be  of 
this  ?  First,  it  might  be  the  case  that  he  was  fed  on  the  movable 
feasts  of  the  Church  ;  there  would  be  the  same  number  of  these 
in  each  year,  but  it  would  not  be  easy  for  any  one  who  had  not 
the  clue  to  discover  any  regularity  in  the  occasions  of  their 
individual  occurrence.  Second,  it  might  be  the  case  that  he 
was  given  scraps  whenever  he  looked  thin,  and  that  the  scraps 
were  withheld  whenever  he  looked  fat,  so  that  if  he  was  given 
scraps  on  one  day,  this  diminished  the  likelihood  of  his  getting 
scraps  on  the  next  day,  whilst  if  they  were  withheld  this  would 
increase  the  likelihood ;  the  dog's  constitution  remaining  constant, 
the  number  of  days  for  scraps  would  tend  to  fluctuate  from 
year  to  year  about  a  stable  value.  Third,  it  might  be  the  case 
that  the  company  at  table  varied  greatly  from  day  to  day,  and 
that  some  days  people  were  there  of  the  kind  who  give  dogs 
scraps  and  other  days  not ;  if  the  set  of  people  from  whom 
the  company  was  drawn  remained  more  or  less  the  same  from 
year  to  year,  and  it  was  a  matter  of  chance  (in  the  objective  sense 
defined  in  §  8  of  Chapter  XXIV.  above)  which  of  them  were 
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there  from  day  to  day,  the  proportion  of  days  for  scraps  might 
again  show  some  degree  of  stability  from  year  to  year.  Lastly, 
a  combination  between  the  first  and  third  type  of  circumstance 
gives  rise  to  a  variant  deserving  separate  mention.  It  might  be 
the  case  that  the  dog  was  only  given  scraps  by  his  master,  that 
his  master  generally  went  away  for  Saturday  and  Sunday,  and 
was  at  home  the  rest  of  the  week  unless  something  happened 
to  the  contrary,  and  that  "  chance  "  causes  would  sometimes 
intervene  to  keep  him  at  home  for  the  week-end  and  away  in 
the  week  ;  in  this  case  the  frequency  of  days  for  scraps  would 
probably  fluctuate  in  the  neighbourhood  of  five-sevenths.  Tn 
circumstances  of  this  third  type,  however,  the  degree  of  stability- 
would  probably  be  less  than  in  circumstances  of  the  first  two 
types  ;  and  in  order  to  get  a  really  stable  frequency  it  might 
be  necessary  to  take  a  longer  period  than  a  year  as  the  basis 
for  each  series  of  observations,  or  even  to  take  the  average  for 
a  number  of  dogs  placed  in  like  circumstances  instead  of  one. 
dog  only. 

It  has  been  assumed  so  far  that  we  have  an  opportunity  of 
observing  what  happens  on  erery  day  of  the  year.  If  this  is 
not  the  case  and  we  have  knowledge  only  of  a  random  sample 
from  the  days  of  each  year,  then  the  stability,  though  it  will  be 
less  in  degree,  may  be  nevertheless  observable,  and  will  increase 
as  the  number  of  days  included  in  each  sample  is  increased. 
This  applies  equally  to  each  of  the  three  types. 

5.  What  is  the  correct  logical  analysis  of  this  sort  of  reasoning  ? 
If  an  inductive  generalisation  is  a  true  one,  the  conclusion  which 
it  asserts  about  the  instance  under  inquiry  is,  so  far  as  it  goes, 
definite  and  final,  and  cannot  be  modified  by  the  acquisition  of 
more  detailed  knowledge  about  the  particular  instance.  But  a 
statistical  induction,  when  applied  to  a  particular  instance,  is 
not  like  this  :  for  the  acquisition  of  further  knowledge  might 
render  the  statistical  induction,  though  not  in  itself  less  probable 
than  bo  tore,  inapplicable  to  that  particular  instance. 

This  is  due  to  the  fact  that  a  statistical  induction  is  not  really 
about  the  particular  instance  at  all,  but  has  its  subject,  about 
which  it  generalises,  a  series  ;  and  it  is  only  applicable  to  the 
particular  instance,  in  so  far  as  the  instance  is  relative  to  our 
knowledge,  a  random  member  of  the  series.  If  the  acquisition  of 
new  knowledge  affords  us  additional  relevant  information  about 
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the  particular  instance,  so  that  it  ceases  to  be  a  random  member 
of  the  series,  then  the  statistical  induction  ceases  to  be  applicable ; 
but  the  statistical  induction  does  not  for  that  reason  become 
any  less  probable  than  it  was — it  is  simply  no  longer  indicated 
by  our  data  as  being  the  statistical  generalisation  appropriate 
to  the  instance  under  inquiry.  The  point  is  illustrated  by  the 
familiar  example  that  the  probability  of  an  unknown  individual 
posting  a  letter  unaddressed  can  be  based  on  the  statistics  of 
the  Post  Office,  but  my  expectation  that  I  shall  act  thus,  cannot 
be  so  determined. 

Thus  a  statistical  generalisation  is  always  of  the  form  :  '  The 
probability,  that  an  instance  taken  at  random  from  the  series 
S  will  have  the  characteristic  <£,  is  p  ; '  or,  more  precisely,  if  a  is 
a  random  member  of  S(#),  the  probability  of  <£(&)  is  p. 

It  will  be  convenient  to  recapitulate  from  Chapter  XXIV.  §  11 
the  definition  of  '  an  instance  taken  at  random  '  :  Let  <f>(x) 
stand  for  '  x  has  the  characteristic  <£,'  and  S(z)  for  '  x  is  a  member 
of  the  class  S  '  ;  then,  on  evidence  h,  a  is  a  random  member 
of  the  class  S  for  characteristic  <£,  if  '  x  is  a '  is  irrelevant  to 
(f)(x)/$(x) .  h,1  i.e.  if  we  have  no  information  about  a  relevant 
to  <f>(a)  except  S(a). 

Or  alternatively  we  might  express  our  definition  as  follows  : 
Consider  a  particular  instance  a,  where  the  object  of  our  inquiry 
is  the  probability  of  <£(«)  relative  to  evidence  h.  Let  us  discard 
that  part  of  our  knowledge  h(a)  which  is  irrelevant  to  (/>(a), 
leaving  us  with  relevant  knowledge  h'(a).  Let  the  class  of 
instances  av  «2>  etc.,  which  satisfy  h'(x]  be  designated  by  S.  Then, 
relative  to  evidence  h,  a  is  a  random  member  of  the  class  or 
series  S  for  the  characteristic  $. 

Let  us  denote  the  proposition  '  x  is,  on  evidence  h,  a  random 
member  of  S  for  characteristic  <£  '  by  R(z,  S,  <£.  h)  ;  then  our 
statistical  generalisation  is  of  the  form  <j)(x)/R(x,  S,  </>,  h) .  h  =p. 

If  R  (a,  S,  </>,  h)  holds,  then,  on  evidence  h,  S  is  the  appropriate 
statistical  series  to  which  to  refer  a  for  the  purposes  of  the  charac 
teristic  $. 

It  is  not  always  the  case  that  the  evidence  indicates  any 
series  at  all  as  '  appropriate  '  in  the  above  sense.  In  particular, 

1  The  use  of  variables  in  probability,  as  has  been  pointed  out  on  p.  58,  is 
very  dangerous.  It  might  therefore  be  better  to  enunciate  the  above  :  a  is  a 
random  member  of  S  for  characteristic  0,  if  0(a)/S(a). h  =  <f>(b)l$(b).h  where 
S(/>).  h  contains  no  information  about  b,  except  that  b  is  a  member  of  S 
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if  evidence  h  indicates  S  as  the  appropriate  series,  and  evidence 
h'  indicates  S'  as  the  appropriate  series,  then  relative  to  evidence 
hh'  (assuming  these  to  be  not  incompatible),  it  may  be  the  case 
that  no  determinate  series  is  indicated  as  appropriate.  In  this 
case  the  method  of  statistical  induction  fails  us  as  a  means  of 
determining  the  probability  under  inquiry. 

6.  We  can  now  remove  our  attention  from  the  individual 
instance  a  to  the  properties  of  the  series  S.  What  sort  of  evidence 
is  capable  of  justifying  the  conclusion  that  p  is  the  probability 
that  a  random  member  of  the  series  S  will  have  the  character 
istic  </>  ? 

In  the  simplest  case,  S  is  a  finite  series  of  which  wre  know  the 
truth  frequency  for  the  characteristic  <£,  namely  f.1  Then  by  a 
straightforward  application  of  the  Principle  of  Indifference  we 
have  p  =/,  so  that  (/>(./')/R(x,  S,  (/>,  h) .  h  =f. 

In  another  important  type  S  is  a  series,  with  an  indefinite 
number  of  members  which,  however,  group  themselves  in  such 
a  way  that  for  every  member  of  which  <f>(x)  is  true,  there  cor 
responds  a  determinate  number  of  members  of  which  <f>(x)  is 
false.  The  series,  that  is  to  say,  contains  an  indefinite  number 
of  atoms,  but  each  atom  is  made  up  of  a  set  of  molecules  of 
which  <f>(x)  is  true  and  false  respectively  in  fixed  and  determinate 
proportions.  If  this  determinate  proportion  is  known  to  be/,  we 
have,  as  before,  p  =/.  The  typical  instance  of  this  type  is  afforded 
by  games  of  chance.  Every  possible  state  of  affairs  which  might 
lead  to  a  divergence  in  one  direction  is  balanced  by  another 
probability  leading  in  the  opposite  direction  ;  and  these  alterna 
tive  possibilities  are  of  a  kind  to  which  the  Principle  of  Indifference 
is  applicable.  Thus  for  every  poise  of  the  dice  box  which  leads 
to  the  fall  of  the  six-face,  there  is  a  corresponding  poise  which 
leads  to  the  fall  of  each  of  the  other  faces  ;  so  that  if  S  is  the 
series  of  possible  poises,  we  may  equate  p  to  £  where  <£  is  the  fall 
of  the  six-face.  It  is  not  necessary,  in  order  to  obtain  this 
result,  to  assert  that  S  is  a  finite  series  with  an  actual  determinate 
frequency /for  the  fall  of  each  face. 

So  far  no  inductive  element  enters  in.  But  in  general  we  do 
not  know  the  constitution  of  S  for  certain,  and  can  only  infer  it 
inductively  from  its  resemblance  to  other  series  of  which  we  know 
the  constitution.  This  presents  a  normal  inductive  problem— 

1   I.i .  if  /  is  the  proportion  of  the  members  of  the  wcriefl  for  which  </>(x)  is  true. 
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the  determination  by  an  analysis  of  the  positive  and  negative 
analogies  as  to  whether  the  respects  in  which  S  differs  or  may 
differ  from  the  other  series  is  or  is  not  relevant  in  the  particular 
context  (j> ;  and  it  involves  the  same  sort  of  considerations  as 
those  discussed  in  Part  III. 

There  is,  however,  a  further  difficulty  to  be  introduced  before 
we  have  reached  the  typical  statistical  problem.  In  the  case 
now  to  be  considered  our  actual  data  do  not  consist  of  positive 
knowledge  of  the  constitutions  either  of  S  itself  or  of  other  series 
more  or  less  resembling  S,  but  only  of  the  frequency  of  the 
characteristic  in  actually  observed  sets  of  selections,  great  or 
small,  either  from  S  itself  or  from  other  series  more  or  less 
resembling  S. 

Thus  in  the  most  general  case  our  inquiry  falls  into  two  parts. 
We  are  given  the  observed  frequency  in  statistical  sets  selected 
from  S15  S2,  etc.,  respectively.  The  first  part  of  our  inquiry  is 
the  problem  of  arguing  from  these  observed  frequencies  to  the 
probable  constitutions  of  Sl3  S2,  etc.,  i.e.  of  determining  the  values 
of  (f)(x)fR(x,  S1?  </>,  h) .  h,  etc. ;  we  may  call  this  part  the  statistical 
problem.  The  second  part  of  our  inquiry  is  the  problem  of 
arguing  from  the  probable  constitutions  of  Sx,  S2,  etc.,  to  the 
probable  constitution  of  S,  where  S,  S1?  S2  resemble  one  another 
more  or  less,  and  we  have  to  determine  whether  the  differences 
are  or  are  not  relevant  to  our  inquiry  ;  we  may  call  this  part  the 
inductive  problem. 

Now  if  the  observed  statistical  sets  are  made  up  of  random 
instances  of  S15  S2,  etc.,  we  can  argue  in  certain  conditions  from 
the  observed  frequencies  to  the  probable  constitutions  of  the 
series,  out  of  which  the  random  selections  have  been  made,  by 
an  inverse  application  of  Bernoulli's  Theorem  on  the  lines  ex 
plained  in  Chapter  XXXI.  Moreover,  if  the  series  Sl5  S2,  etc., 
are  finite  series  and  the  observed  selections  cover  a  great  part 
of  their  members,  we  can  reach  an  at  least  approximate  con 
clusion  without  raising  all  the  theoretical  difficulties  or  satisfying 
all  the  conditions  of  Chapter  XXXI.  The  commonly  received 
opinions  as  to  the  bearing  of  the  observed  frequencies  in  a 
random  sample  on  the  constitution  of  the  universe  out  of  which 
the  sample  is  drawn,  though  generally  stated  too  precisely  and 
without  sufficient  insistence  on  the  assumptions  they  involve, 
our  actual  evidence  not  warranting  in  general  more  than  an 
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approximate  result,  are  not,  1  think,  fundamentally  erroneous. 
The  most  usual  error  in  modern  method  consists  in  treating  too 
lightly  what  I  have  termed  above  the  inductive  problem,  i.e. 
the  problem  of  passing  from  the  series  S15  S2,  etc.,  of  which  we 
have  observed  samples,  to  the  series  S  of  which  we  have  not 
observed  samples. 

Let  us,  then,  assume  that  we  have  ascertained  plt  p2,  etc.,  with 
more  or  less  exactness,  by  examining  either  all  the  instances  of 
the  series  S15  S2,  etc.,  or  random  selections  from  them,  i.e.  0(z)/R 
(x,  S1?  (f>,  h)  .h=p1,  etc.  This  can  be  expressed  for  short  by  saying 
that  the  series  S1}  S2,  etc.,  are  subject  to  probable-frequencies 
pv  p2,  etc.,  for  the  characteristic  <£.  Our  problem  is  to  infer  from 
this  the  probable-frequency  p  of  the  unexamined  series  S.  The 
class  characteristics  of  the  series  S15  S2,  etc.,  will  be  partly  the  same 
and  partly  different.  Using  the  terminology  of  Part  IIJ.  we 
may  term  the  class  characteristics  which  are  common  to  all  of 
them  the  Positive  Analogy,  and  the  class  characteristics  which 
are  not  common  to  all  of  them  the  Negative  Analogy. 

Now,  if  the  observed  or  inferred  probable  -  frequencies  of 
the  series  S1}  S2,  are  to  form  the  basis  of  a  statistical  induction, 
they  must  show  a  stable  value  ;  that  is  to  say,  either  we  must 
have  j)l  =  p2  =  etc.,  or  at  least  pv  p2,  etc.,  must  be  stably  grouped 
about  their  mean  value.  Our  next  task,  therefore,  must  be 
to  discover  whether  the  probable-frequencies  p^  p2,  etc.,  display 
a  significant  stability.  It  is  the  great  merit  of  Lexis  that  he  was 
the  first  to  investigate  the  problem  of  stability  and  to  attempt  its 
measurement.  For,  until  a  primdfade  case  has  been  established 
for  the  existence  of  a  stable  probable-frequency,  we  have  In  it 
a  flimsy  basis  for  any  statistical  induction  at  all  ;  indeed  we  arc 
limited  to  the  class  of  case  where  the  instance  under  inquiry  is 
a  member  of  identically  the  same  series  as  that  from  which  our 
samples  were  drawn,  i.e.  where  S  —  Sj,  which  in  social  and  scientific 
inquiries  is  seldom  the  case. 

What  is  the  meaning  of  the  assertion  that  pv  p.,,  etc..  are 
stably  grouped  about  their  mean  value  ?  The  answer  is  not 
simple  and  not  perfectly  precise.  We  could  propound  various 
formulae  for  the  measurement  of  stability  and  dispersion,  respect 
ively,  ami  the  problem  of  translating  the  conception  of  stability, 
which  is  not  quantitatively  precise,  into  a  numerical  formula 
involves  an  arbitrary  or  approximative  element.  For  practical 
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purposes,  however,  I  doubt  if  it  is  possible  to  improve  on  Lexis's 
measure  of  stability  Q,  the  mathematical  definition  of  which 
has  been  given  above  on  p.  399.  Lexis  describes  the  stability 
as  subnormal,  normal,  or  supernormal  according  as  Q  is  less  than, 
equal  to,  or  greater  than  1.  This  is  too  precise,  and  it  is  better 
perhaps  to  say  that  the  stability  about  the  mean  is  normal  if 
the  dispersion  is  such  as  would  not  be  improbable  a  priori,  if 
we  had  assumed  that  the  members  of  S1?  S2,  etc.,  were  obtained 
by  random  selection  out  of  a  single  universe  U,  that  it  is  sub 
normal  if  the  dispersion  is  less  than  one  would  have  expected  on 
the  same  hypothesis,  and  that  it  is  supernormal  if  the  dispersion 
is  greater  than  one  would  have  expected. 

Let  us  suppose  that  we  find  that  on  this  definition  pv  p2,  etc., 
are  stable  about  p,  and  let  us  postpone  consideration  of  the  cases 
of  subnormal  or  supernormal  dispersion.  This  is  equivalent  to 
saying  that  the  frequencies  of  S1}  S2,  etc.,  are  within  limits  which 
we  should  expect  d  priori,  if  we  had  knowledge  relative  to  which 
their  members  were  chosen  at  random  from  a  universe  U  of  which 
the  frequency  was  p  for  the  characteristic  under  inquiry.  We 
next  seek  to  extend  this  result  to  the  unexamined  series  S  and  to 
justify  anticipations  about  it  on  the  basis  of  the  members  of  S 
also  being  chosen  at  random  from  the  universe  U.  This  leads  us 
to  the  strictly  inductive  part  of  our  inquiry. 

The  class  characteristics  of  the  several  series  S1}  S2,  etc.,  will  be 
partly  the  same  and  partly  different,  those  that  are  the  same 
constituting  the  positive  analogy  and  those  that  are  different 
constituting  the  negative  analogy,  as  stated  above.  The  series 
S  will  share  part  of  the  positive  analogy.  The  argument  for 
assimilating  the  properties  of  S,  in  relation  to  the  characteristic 
under  inquiry,  to  the  properties  of  S1?  S2,  etc.,  in  relation  to  this 
characteristic  depends  on  the  differences  between  S,  S15  S2,  etc., 
being  irrelevant  in  this  particular  connection.  The  method  of 
strengthening  this  argument  seems  to  rne  to  be  the  same  as  the 
general  inductive  method  discussed  in  Part  III.  and  to  present 
the  same,  but  not  greater,  difficulties. 

In  general  this  inductive  part  of  our  inquiry  will  be  best 
advanced  by  classifying  the  aggregate  series  of  instances  with 
which  we  are  presented  in  such  a  way  as  to  analyse  most  clearly 
the  significant  positive  and  negative  analogies,  to  group  them, 
that  is  to  say,  into  sub-series  S1}  S2,  etc.,  which  show  the  most 
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marked  and  definite  class  characteristics.  Our  knowledge  of  the 
differences  between  the  particular  observed  instances  which 
constitute  our  original  data  will  suggest  to  us  one  or  more 
principles  of  classification,  such  that  the  members  of  each  sub- 
series  all  have  in  common  some  set  of  positive  or  negative  char 
acteristics,  not  all  of  which  are  shared  in  common  by  all  the 
members  of  any  of  the  other  sub-series.  That  is  to  say,  we 
classify  our  whole  set  of  instances  into  a  series  of  series  S1?  S2,  etc., 
which  have  frequencies  /,,  f.,,  etc.,  for  the  characteristic  under 
inquiry  ;  and  then  again  we  classify  them  by  another  principle  or 
criterion  of  classification  into  a  second  series  of  series  S/,  S0',  etc., 
with  frequencies //,//, etc. ;  and  so  on,  so  far  as  our  knowledge  of 
the  possible  relevant  differences  between  the  instances  extends  ; 
the  whole  result  being  then  summed  up  in  a  statement  of  the 
positive  and  negative  analogies  of  the  series  of  series.  If  we  then 
find  that  all  the  frequencies /1?/2, etc.,  fi',f2f,  etc.,  are  stable  about 
a  value  p,  and  if,  on  the  basis  of  the  above  positive  and  negative 
analogies,  we  have  a  normal  inductive  argument  for  assimilating 
the  unexamined  series  S  to  the  examined  series  Sl5  S2,  etc.,  S/,  S2', 
etc.,  in  respect  of  the  characteristic  under  inquiry,  in  this  case  we 
have,  not  conclusive  grounds,  but  grounds  of  some  weight  for 
asserting  the  probability  p,  that  an  instance  taken  at  random 
from  S  will  have  the  characteristic  in  question. 

Let  me  recapitulate  the  two  essential  stages  of  the  argu 
ment.  We  first  find  that  the  observed  frequencies  in  a  set  of 
series  are  such  as  would  have  been  not  improbable  a  priori  if, 
relative  to  our  knowledge,  these  series  had  all  been  made  up  of 
random  members  of  the  same  universe  U  ;  and  we  next  argue 
that  the  positive  and  negative  analogies  of  this  set  of  series 
furnish  an  inductive  argument  of  some  weight  for  supposing  that 
a  further  unexamined  series  S  resembles  the  former  series  in 
having  a  frequency  for  the  characteristic  under  inquiry  such  as 
would  have  been  not  improbable  a  priori  if,  relative  to  our  know 
ledge,  S  was  also  made  up  of  random  members  of  the  hypo 
thetical  universe  U. 

7.  It  is  very  perplexing  to  decide  how  far  an  argument  of 
this  character  involves  any  new  and  theoretically  distinct 
difficulties  or  assumptions,  beyond  those  already  admitted 
as  inherent  in  Universal  Induction.  1  believe  that  the  fore 
going  analysis  is  along  the  right  lines  and  that  it  carries  the 

2E 
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inquiry  a  good  deal  further  than  it  has  been  carried  hitherto. 
But  it  is  not  conclusive,  and  1  must  leave  to  others  its  more 
exact  elucidation. 

There  is,  however,  a  little  more  to  be  said  about  the  half-felt 
reasons  which,  in  my  judgment,  recommend  to  common  sense 
some  at  least  of  the  scientific  (or  semi-scientific)  arguments 
which  run  along  the  above  lines.  In  expressing  these  reasons  I 
shall  be  content  to  use  language  which  is  not  always  as  precise  as 
it  ought  to  be. 

I  gave  in  Chapter  XXIV.  §§  7-9  an  interpretation  of  what  is 
meant  by  an  '  objectively  chance  '  occurrence,  in  the  sense  in 
which  the  results  of  a  game,  such  as  roulette,  may  be  said  to  be 
governed  by  '  objective  chance.'  This  interpretation  was  as 
follows  :  "  An  event  is  due  to  objective  chance  if  in  order  to 
predict  it,  or  to  prefer  it  to  alternatives,  at  present  equi-probable, 
with  any  high  degree  of  probability,  it  would  be  necessary  to 
know  a  great  many  more  facts  of  existence  about  it  than  we 
actually  do  know,  and  if  the  addition  of  a  wide  knowledge  of 
general  principles  would  be  little  use."  The  ideal  instance  of 
this  is  the  game  of  chance  ;  but  there  are  other  examples  afforded 
by  science  in  which  these  conditions  are  fulfilled  with  more  or 
less  perfection.  Now  the  field  of  statistical  induction  is  the  class 
of  phenomena  which  are  due  to  the  combination  of  two  sets  of 
influences,  one  of  them  constant  and  the  other  liable  to  vary  in 
accordance  with  the  expectations  of  objective  chance, — Quetelet's 
'  permanent  causes  '  modified  by  '  accidental  causes.'  In  social 
and  physical  statistics  the  ultimate  alternatives  are  not  as  a  rule 
so  perfectly  fixed,  nor  the  selection  from  them  so  purely  random, 
as  in  the  ideal  game  of  chance.  But  where,  for  example,  we  find 
stability  in  the  statistics  of  crime,  we  could  explain  this  by 
supposing  that  the  population  itself  is  stably  constituted,  that 
persons  of  different  temperaments  are  alive  in  proportions  more 
or  less  the  same  from  year  to  year,  that  the  motives  for  crime  are 
similar,  and  that  those  who  come  to  be  influenced  by  these 
motives  are  selected  from  the  population  at  large  in  the  same 
kind  of  way.  Thus  we  have  stable  causes  at  work  leading  to  the 
several  alternatives  in  fixed  proportions,  and  these  are  modified 
by  random  influences.  Generally  speaking,  for  large  classes  of 
social  statistics  we  have  a  more  or  less  stable  population  including 
different  kinds  of  persons  in  certain  proportions  and  on  the  other 
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hand  sets  of  environments  ;  the  proportions  of  the  different 
kinds  of  persons,  the  proportions  of  the  different  kinds  of  environ 
ments,  and  the  manner  of  allotting  the  environments  to  the 
persons  vary  in  a  random  manner  from  year  to  year  (or,  it  may  be, 
from  district  to  district).  In  all  such  cases  as  these,  however, 
prediction  beyond  what  has  been  observed  is  clearly  open  to 
sources  of  error  which  can  be  neglected  in  considering,  for 
example,  games  of  chance  ; — our  so-called  '  permanent '  causes 
are  always  changing  a  little  and  are  liable  at  any  moment  to 
radical  alteration. 

Thus  the  more  closely  that  we  find  the  conditions  in  scientific 
examples  assimilated  to  those  in  games  of  chance,  the  more 
confidently  does  common  sense  recommend  this  method.  The 
rather  surprising  frequency  with  which  we  find  apparent  stability 
in  human  statistics  may  possibly  be  explained,  therefore,  if  the 

biological  theorv  of  Mendelism  can  be  established.     According  to 

_ 

this  theory  the  qualities  apparent  in  any  generation  of  a  given 
race  appear  in  proportions  which  are  determined  by  methods 
very  closely  analogous  to  those  of  a  game  of  chance.  To  take  a 
specific  example  (I  am  giving  not  the  correct  theory  of  sex  but  an 
artificially  simplified  form  of  it),  suppose  there  are  two  kinds  of 
spermatozoa  and  two  kinds  of  ova  and  of  the  four  possible  kinds 
of  union  two  produce  males  and  two  females,  then  if  the  kinds  of 
spermatozoa  and  ova  exist  in  equal  numbers  and  their  union  is 
determined  by  random  considerations  in  precisely  the  same  sense 
in  which  a  game  of  chance  such  as  roulette  depends  upon  random 
considerations,  we  should  expect  the  observed  proportions  to 
vary  from  equality,  as  indeed  they  do,  in  the  same  manner  as 
variations  from  equality  of  red  and  black  occur  at  roulette.1  If 
the  sphere  of  influence  of  Mendelian  considerations  is  wide,  wo 
have  both  an  explanation  in  part  of  what  we  observe  and  also  a 
large  opportunity  in  future  of  using  with  profit  the  methods  of 
statistical  analysis. 

This  is  all  familiar.  This  is  the  way  in  which  in  fact  \ve  do 
think  and  argue.  The  inquiry  as  to  how  far  it  is  covered  by  the 
abstract  analysis  of  the  preceding  paragraphs,  and  by  what 

1  The  fluctuations  in  the.  proj>ortion  of  the  sexes  which,  as  is  well  known, 
is  nut  in  fact  one  of  equality,  correspond,  as  Lexis  haw  .shown,  to  what  one 
would  expect  in  a  game  of  chance  with  an  astonishing  exactitude.  lint 
it  is  difficult  to  tind  any  other  example,  amongst  natural  or  social  phenomena, 
in  which  his  criteria  of  stability  are  by  any  means  as  equally  well  satisfied. 
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logical  principle  the  use  of  this  analysis  can  be  justified  as  rational, 
I  have  pushed  as  far  as  I  can.  It  deserves  a  profounder  study 
than  logicians  have  given  it  in  the  past. 

8.  Two  subsidiary  questions  remain  to  be  mentioned.  The 
first  of  these  relates  to  the  character  of  series  which,  in  the 
terminology  of  Lexis,  show  a  subnormal  or  supernormal  stability  ; 
for  I  have  pressed  on  to  the  conclusion  of  the  argument  on  the 
assumption  that  the  stabilities  are  normal.  Subnormal  stability 
conceals  two  types  :  the  one  in  which  there  is  really  no  stability 
at  all  and  the  results  are  in  fact  chaotic  ;  and  the  other  in  which 
there  is  mutual  dependence  between  the  successive  instances  of 
such  a  kind  that  they  tend  to  resemble  one  another  so  that  any 
divergence  from  the  normal  tends  to  accentuate  itself.  Super 
normal  stability  corresponds  in  the  other  direction  to  the  second 
of  these  two  types  ;  that  is  to  say,  there  is  mutual  dependence  of 
a  regulative  kind  between  the  successive  instances  which  tends 
to  prevent  the  frequency  from  swinging  away  from  its  mean 
value.  The  case,  where  the  dog  was  fed  with  scraps  when  he 
looked  thin  and  not  fed  when  he  looked  fat,  illustrated  this. 
The  typical  example  of  this  type  is  where  balls  are  drawn  from 
urns,  containing  black  and  white  balls  in  certain  proportions  and 
not  replaced  ;  so  that  every  time  a  black  ball  is  drawn  the  next 
ball  is  more  likely  than  before  to  be  white,  and  there  is  a  tendency 
to  redress  any  excess  of  either  colour  beyond  the  proper  propor 
tions.  Possibly  the  aggregate  annual  rainfall  may  afford  a 
further  illustration. 

Where  there  is  no  stability  at  all  and  the  frequencies  are  chaotic, 
the  resulting  series  can  be  described  as  '  non-statistical.'  Amongst 
'  statistical  series,'  we  may  term  '  independent  series  '  those  of 
which  the  instances  are  independent  and  the  stability  normal, 
and  '  organic  series,'  those  of  which  the  instances  are  mutually 
dependent  and  the  stability  abnormal,  whether  in  excess  or  in 
defect.  '  Organic  series  '  have  been  incidentally  discussed  else 
where  in  this  volume.  I  shall  not  pursue  them  further  now, 
because  I  do  not  think  that  they  introduce  any  new  theoretical 
difficulty  into  the  general  problem  of  statistical  inference ; 
although  the  problem  of  fitting  them  into  the  general  theoretical 
scheme  is  not  easy.1 

1  The  following  more  precise  definitions  bring  these  ideas  into  line  with  what 
has  gone  before  :  consider  the  terms  alt  a2  .  .  an  of  a  series  s(x)  ;  let  '  ar  is  g  ' 
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9.  The  second  question  is  concerned  with  the  relation  between 
the  Inductive  Correlation,  which  has  been  the  subject-matter  of 
this  chapter,  and  the  Correlation  Coefficient,  or,  as  I  should  prefer 
to  call  it,  the  Quantitative  Correlation,  with  which  recent  English 
statistical  theory  has  chiefly  occupied  itself.  I  do  not  propose 
to  discuss  this  theory  in  detail,  because  I  suspect  that  it  is  much 
more  concerned,  at  any  rate  in  its  present  form,  with  statistical 
description  than  with  statistical  induction.  The  transition  from 
denning  the  '  correlation  coefficient '  as  an  algebraical  expres 
sion  to  its  employment  for  purposes  of  inference  is  very  far  from 
clear  even  in  the  work  of  the  best  and  most  systematic  writers 
on  the  subject,  such  as  Mr.  Yule  and  Professor  Bowley. 

In  the  notation  employed  in  the  earlier  part  of  this  chapter  1 
have  classified  each  examined  instance  a  according  as  it  did  or 
did  not  possess  the  characteristic  <£,  i.e.  satisfy  the  prepositional 
function  <£(./•),  or,  in  other  words,  according  as  <£(a)  was  true  or 
false.  Thus  only  two  possible  alternatives  were  contemplated, 
and  (/)  was  not  considered  as  a  quantitative  characteristic  which 
the  instance  could  satisfy  in  greater  or  less  degree.  Equally  the 
common  element  in  all  the  instances,  required  to  constitute  them 
as  instances  for  the  purpose  of  our  statistical  generalisation  (or, 
as  I  have  sometimes  put  it,  required  to  satisfy  the  condition  of  the 
generalisation),  was  regarded  as  definite  and  unique  and  not 
capable  of  quantitative  variation.  That  is  to  say,  all  the  instances 
satisfied  a  function  ty(j ),  and  the  question  was,  what  proportion 

~<7r  and  let  (jT  /i  />r,  where  h  is  our  data.  Then,  if  f/r/V/,  .  .  .  gt  .  .  .  h  -  pr  for  all 
values  of  r,  s,  .  .  .,  t.  .  .,  the  terms  of  the  series  are  independent  relative  to  h.  If 
Pi  ~Fz  ~-  -P  the  terms  are  uniform.  If  the  terms  are  both  independent  and 
uniform,  the  series  may  be  called  an  independent  Bernoullian  S(rie,<t,  subject  to 
a  Bernoullian  probability  p.  If  the  terms  are  indcj>endent  but  not  uniform,  the 
series  may  be  called  an  independent  compound  xerirs,  subject  to  a  compounded 
probability  l/wil;jf.  If  the  terms  are  not  independent,  the  series  is  an  organic 
series. 

The  same  terminology  can  then  be  applied  to  t  he  series  S,,  S.,,  .  .  .  Sn,  regarded 
as  members  of  the  series  of  series  S(z).  Ix>t  the  frequencies  of  the  series  for  the 
characteristic  under  inquiry  be  ./-,,  r2,  .  .  .  rn,  and  let  .r,  //  'M-f,),  i.e.  ",(*,)  is  the 
probability  of  a  frequency  *r,  in  the  first  series.  Then  if  jrf  xt...h  <>r(.rr)  for  all 
values  of  r,  *,  etc.,  the  frequencies  an-  independent  ;  and  if  tf,(.r)  <)t(s2)  ...  O(j-), 
the  frequencies  are  stable.  If  the  frequencies  are  stable  and  indej>endent,  the 
series  of  series  may  be  called  Gaussian.  If  the  frequencies  are  stable  and 
independent,  and  if  in  addition  each  individual  series  is  subject  to  a  Bcrnoullian 
probability,  the  probable  disi>erwon  of  the  frequency  is  normal  and  symmetrical. 
If  the  individual  scries  are  organic,  the  dispersion  of  the  frequencies  may  be 
normal,  subnormal,  or  suprniMrmal.  If  the  series  of  series  is  Gaussian,  and  the 
individual  series  Bernoiiilian,  we  have  the  tyj>e  of  the  jn-rfect  statistical  series 
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of  them  also  satisfied  the  function  fy(x).  A  typical  example  was 
that  of  sex-ratio, — ty(x)  being  the  birth  of  a  child  and  <j)(x)  its 
sex,  where  there  is  no  question  of  degree  in  either  ^(x)  or  <j>(x). 

It  might  be  the  case,  however,  that  the  characteristics  under 
examination  were  capable  of  degree  or  quantitative  variation  ; 
for  example  yfr(x)  might  be  the  age  of  the  mother  and  <j>(x)  the 
weight  of  the  child  at  birth.  In  this  case  we  should  have  a  series 
^(x),  •\/r2(:r),  etc.,  corresponding  to  the  various  age-periods  of  the 
mothers,  and  a  series  </>i(£),</>2(£),  etc., corresponding  to  the  various 
weights  of  the  children.  Now  if  we  concentrated  our  attention 
on  ^TI(OJ)  and  ^>1(x)  alone,  i.e.  on  mothers  of  a  particular  age  and 
the  proportions  of  their  children  which  had  a  particular  weight 
at  birth,  we  have  a  one-dimensional  problem  of  the  same  kind  as 
before  ;  out  of  all  the  instances  which  satisfy  ^(x)  a  certain 
proportion  satisfy  <^1(x)  also.  But  clearly  we  can  push  our 
observations  further  and  we  can  take  note  what  proportion  of  the 
instances  which  satisfy  ^(x)  satisfy  02(flJ)>  c/)3(^),and  so  on,  respect 
ively  ;  and  then  we  can  do  the  same  as  regards  the  instances 
which  satisfy  ^2(x),  ^3(x),  etc.  The  total  results  of  this  two- 
dimensional  set  of  observations  can  then  be  tabulated  in  what  is 
called  a  twofold  correlation  table.  Thus  if  frs  is  the  proportion 
of  instances  satisfying  tys(x)  which  also  satisfy  <f>r(x)  we  have  a 
table  as  follows  : 


Ax  /H 

AT  A, 


As 


We  could,  further,  increase  the  complexity  and  completeness 
of  our  observations  to  any  required  degree.  For  example  we 
might  take  account  also  of  0(x),  the  age  of  the  father,  and  con 
struct  a  threefold  table  where  frst  is  the  proportion  of  instances 
satisfying  </>/.(*')>  ^s(x),  &t(x)  '•>  an^  so  on  UP  to  an  ^-fold  table. 

Clearly  it  is  not  necessary  for  the  construction  of  tables  of 
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tins  kind  that  <j)(x)  and  ^jr(x)  should  stand  for  degrees  of  the  same 
quantitative  characteristic  ;  they  might  be  any  set  of  exclusive 
alternatives  ;  for  example,  -v/r(x)  might  be  the  colour  of  the  baby's 
eyes,  and  (£>(x}  its  Christian  name. 

But  in  order  that  the  correlation  table  may  be  of  any 
practical  interest  for  the  purposes  of  inference,  it  is  necessary — 
and  this,  I  think,  is  one  of  the  critical  assumptions  of  correla 
tion — that  fafa),  <p2(x]  •  -  •  and  also  ^(x),  <f)2(x)  .  .  .  should 
be  arranged  in  an  order  that  is  significant,  i.e.  such  that  we  have 
some  d  priori  reason  for  expecting  some  connection  to  exist 
between  the  order  of  the  $'s  and  the  order  of  the  </>'s.  The  point 
of  this  will  lie  illustrated  b  concentratin  our  attention  on  the 


slight  one,  for  supposing  that  there  might  be  some  connection 
between  the  age  of  the  mother  and  the  weight  of  the  baby,  then, 
if  in  a  particular  set  of  instances  the  frequencies  were  grouped 
about  the  diagonal  as  suggested  above,  this  might  be  taken  as 
affording  some  inductive  support  for  the  hypothesis. 

Xow  the  theory  of  correlation,  as  it  is  expounded  in  the 
text-books,  is  almost  entirely  concerned  with  measuring  how 
nearly  the  observed  frequencies  are  grouped  about  the  diagonal 
of  the  table  (though  the  complete  theory  is  not,  of  course,  so 
restricted  as  this).  The  'coefficient  of  correlation'  is  an  algebraical 
formula  which  may  be  regarded  as  measuring  this  phenomenon 
in  a  way  that  is  sufficiently  satisfactory  for  all  ordinary  purposes. 
If  it  is  defined  thus,  it  is  simply  a  statistical  description  of  a 
particular  sot  of  observations  arranged  in  a  particular  order. 
How  can  wo  make  use  of  this  coefficient  for  the  purposes  of 
inference  ? 

Dr.  Rowley  faces  this  problem  a  little  more  definitely  than  do 
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of  them  also  satisfied  the  function  $(x).  A  typical  example  was 
that  of  sex-ratio, — \fr(x)  being  the  birth  of  a  child  and  <f>(x)  its 
sex,  where  there  is  no  question  of  degree  in  either  ^(x)  or  <£(je). 

It  might  be  the  case,  however,  that  the  characteristics  under 
examination  were  capable  of  degree  or  quantitative  variation  ; 
for  example  ^r(x)  might  be  the  age  of  the  mother  and  <f>(x)  the 
weight  of  the  child  at  birth.  In  this  case  we  should  have  a  series 
tyi(x],  ^2(^)5  Q^c->  corresponding  to  the  various  age-periods  of  the 
mothers,  and  a  series  <pi(x),(f)2(x),  etc., corresponding  to  the  various 
weights  of  the  children.  Now  if  we  concentrated  our  attention 
on  tyi(x)  and  ty^x)  alone,  i.e.  on  mothers  of  a  particular  age  and 
the  proportions  of  their  children  which  had  a  particular  weight 

ERRATA 

Page  423,  1.  8.  For  the  first-mentioned  0,,  02  read  fa,  \f/.,. 

1.  11.  For  the  first-mentioned  0  read  \f/. 

1.  13.  For  the  first-mentioned  0  read  ^. 

1.  16.  For  0lf  02  read  fa,  fa. 


0s(«) 


/ll 

f-n 


/IS 


/38 


We  could,  further,  increase  the  complexity  and  completeness 
of  our  observations  to  any  required  degree.  For  example  we 
might  take  account  also  of  6(x),  the  age  of  the  father,  and  con 
struct  a  threefold  table  where  frst  is  the  proportion  of  instances 
satisfying  <f>.r(x),  ^(r),  Ot(x)  ;  and  so  on  up  to  an  w-fokl  table. 

Clearly  it  is  not  necessary  for  the  construction  of  tables  of 


CH.  xxxm  STATISTICAL  INFERENCE  423 

this  kind  that  (£>(x)  and  ^fr(x)  should  stand  for  degrees  of  the  same 
quantitative  characteristic  ;  they  might  be  any  set  of  exclusive 
alternatives  ;  for  example,  -^(x)  might  be  the  colour  of  the  baby's 
eyes,  and  $(x)  its  Christian  name. 

But  in  order  that  the  correlation  table  may  be  of  any 
practical  interest  for  the  purposes  of  inference,  it  is  necessary — 
and  this,  I  think,  is  one  of  the  critical  assumptions  of  correla 
tion—that  (f)i(x),  (/>2(^)  .  .  .  and  also  (/>1(x),  02(.r)  •  •  •  should 
be  arranged  in  an  order  that  is  significant,  i.e.  such  that  we  have 
some  d  priori  reason  for  expecting  some  connection  to  exist 
between  the  order  of  the  <£'s  and  the  order  of  the  <£'s.  The  point 
of  this  will  be  illustrated  by  concentrating  our  attention  on  the 
simplest  type  of  case  where  <£(./•)  and  <£(.?;)  are  quantitative 
characteristics  arranged  in  order  of  magnitude.  Now  suppose 
it  were,  the  case  that  the  younger  mothers  tended  to  bear  heavier 
babies,  then,  if  (fr^x)  (p2(J'}  are  the  ages  increasing  upwards  and 
(/>1(.r)  (f)2(x)  the  weights  diminishing  downwards,  fn would  probably 
be  the  greatest  of  the/rl's  and,  generally  speaking, /rl  would  be 
greater  than/r+1 1 ;  also/22  might  be  the  greatest  of  the/r2's,  and 
so  on  ;  so  that  the  frequencies  lying  on  the  diagonal  of  the  table 
would  be  the  greatest  and  the  frequencies  would  tend  to  be  less 
the  farther  they  lay  from  the  diagonal.  If  we  had  some  reason 
d  jmori  (i.e.  based  on  our  pre-existing  knowledge),  if  only  a 
slight  one,  for  supposing  that  there  might  be  some  connection 
between  the  age  of  the  mother  and  the  weight  of  the  baby,  then, 
if  in  a  particular  set  of  instances  the  frequencies  were  grouped 
about  the  diagonal  as  suggested  above,  this  might  be  taken  as 
affording  some  inductive  support  for  the  hypothesis. 

Now  the  theory  of  correlation,  as  it  is  expounded  in  the 
text-books,  is  almost  entirely  concerned  with  measuring  how 
nearly  the  observed  frequencies  are  grouped  about  the  diagonal 
of  the  table  (though  the  complete  theory  is  not,  of  course,  so 
restricted  as  this).  The  'coefficient  of  correlation'  is  an  algebraical 
formula  which  may  be  regarded  as  measuring  this  phenomenon 
in  a  way  that  is  sufficiently  satisfactory  for  all  ordinary  purposes. 
If  it  is  defined  thus,  it  is  simply  a  statistical  description  of  a 
particular  sot  of  observations  arranged  in  a  ] (articular  order. 
How  can  we  make  use  of  this  coefficient  for  the  purposes  of 
inference;  ? 

Dr.  Bowley  faces  this  problem  a  little  more  definitely  than  do 
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most  statistical  writers.  Mr.  Yule  warns  the  student  that  the 
problem  exists,1  but  he  does  not  himself  attack  it  systematically 
or  do  more  than  apply  common  sense  to  particular  problems. 
So  much  greater  emphasis,  however,  has  been  laid  hitherto  on 
the  mathematical  complications,  that  many  statistical  students 
hazily  float  from  defining  the  correlation  coefficient  as  a  statistical 
description  to  employing  it  as  a  measure  of  the  probability  of  a 
statistical  generalisation  as  to  the  association  between  quanti 
tative  variations  of  <j)(x)  and  ^r(x]  respectively.  If,  for  ex 
ample,  it  is  found  in  a  particular  set  of  observations  of 
mothers'  ages  and  babies'  weights  that  the  frequencies  are 
closely  ranged  about  the  diagonal,  this  is  considered  a  sufficiently 
good  reason  for  attributing  probability  to  a  generalisation  as  to 
the  '  correlation  '  (i.e.  tendency  to  quantitative  correspondence) 
between  the  age  of  the  mother  and  the  weight  of  the  baby. 

Dr.  Bowley's  line  of  thought  is  as  follows.  He  begins  by 
defining  the  correlation  coefficient  r  merely  as  a  statistical  de 
scription  (Elements  of  Statistics,  p.  354).  He  then  shows  (p.  355), 
as  an  illustration  of  the  nature  of  r,  that  if  x  and  y  are  two 
variable  quantities  which  depend  (more  strictly,  are  known  to 
depend)  on  other  variables  U,  V,  W  in  such  a  way  that 


where  JJt,  2Uf  .  .  .  -,Y,,  2Vt  .  .  .  xWf,  2Wt  ....  are  selected 
at  random  each  from  an  independent  group  of  quantities  (more 
strictly,  are  relative  to  our  data,  random  members  of  independent 
groups)  ;  then,  if  we  know  d  priori  certain  statistical  coefficients 
descriptive  of  the  constitution  of  these  groups,  the  value  of  r 
will  probably  tend  towards  a  certain  value.  So  far  we  are  on 
fairly  safe,  but  not  very  fruitful,  ground.  We  have  no  basis 
for  arguing  backwards  from  the  observed  value  of  r  ;  but, 
provided  we  have  rather  extensive  and  peculiar  knowledge 
d  priori  as  to  how  X,  and  Yt  are  constituted,  then  we  have 
calculable  expectations  as  to  the  limits  within  which  the  value 

1  Introduction  to  the  Theory  of  Statistics,  p.  191  :  "  The  coefficient  of  correla 
tion,  like  an  average  or  a  measure  of  dispersion,  only  exhibits  in  a  summary 
and  comprehensible  form  one  particular  aspect  of  the  facts  on  which  it  is  based, 
and  the  real  difficulties  arise  in  the  interpretation  of  the  coefficient  when 
obtained." 
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of  r,  namely  the  correlation  coefficient  between  X  and  Y,  will 
probably  turn  out  to  lie,  when  we  have  observed  it. 

Dr.  Bowley's  next  move  is  more  dubious.  If  the  constitu 
tions  of  the  independent  groups  are  similar  in  a  certain  statistical 
respect  (i.e.  if  they  have  the  same  standard  deviations),  then, 

Dr.    Bowlev   concludes,   r=—      ,    which    "expressed   in    words 

p+f 

shows  that  the  correlation  coefficient  tends  to  be  the  ratio  of 
the  number  of  causes  common  in  the  genesis  of  two  variables 
to  the  whole  number  of  independent  causes  on  which  each 
depends."  By  this  time  the  student's  mind,  unless  anchored 
by  a  more  than  ordinary  scepticism,  will  have  been  wrell  launched 
into  a  vague,  fallacious  sea. 

Neglecting,  however,  the  dictum  just  quoted,  we  find  that  the 
second  stage  of  the  argument  consists  in  showing  that,  if  we 
have  a  certain  sort  of  knowledge  a  priori  as  to  how  our  variables 
are  constituted,  then  the  various  possible  values  for  the  coefficients 
of  correlation,  which  would  be  yielded  by  actual  sets  of  observa 
tions  made  in  prescribed  conditions,  will  have,  a  priori,  and 
before  the  observations  have  been  made,  calculable  probabilities, 
certain  ranges  of  values  being  probable  and  others  improbable. 

As  a  rule,  however,  we  are  not  arguing  from  knowledge  about 
the  variables  to  anticipations  about  their  correlation  coefficient  ; 
but  the  other  way  round,  that  is  from  observations  of  their 
correlation  coefficients  to  theories  about  the  nature  of  the  vari 
ables.  Dr.  Bowley  perceives  that  this  involves  a  third  stage 
of  the  argument,  and  appeals  accordingly  (p.  409)  to  "  the 
difficult  and  elusive  theory  of  inverse  probability."  He  appre 
hends  the  difficulty  but  he  does  not  pursue  it ;  and,  like  Mr. 
Yule,  he  really  falls  back  for  practical  purposes  on  the  criteria 
of  common  sense,  an  expedient  well  enough  in  his  case,  but  not 
a  universal  safeguard. 

The  general  argument  from  inverse  probability  to  which  Dr. 
Bowley  makes  his  vague  appeal  is  doubtless  on  the  following 
lines  :  If  there  is  no  causal  connection  between  the  two  sots  of 
quantities,  then  a  close  grouping  of  the  frequencies  about  the 
diagonal  would  be  a  priori  improbable  (and  the  greater  the 
number  of  the  individual  observations,  the  greater  the  improba 
bility  since,  if  the  quantities  are  independent,  there  is,  then,  all 
the  more  opportunity  for  4  averaging  out  ')  ;  therefore,  inversely, 
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if  the  frequencies  do  group  themselves  about  the  diagonal,  we 
have  a  presumption  in  favour  of  a  causal  connection  between 
the  two  sets  of  quantities. 

But  if  the  reader  recalls  our  discussion  of  the  principle  of 
inverse  probability,  he  will  remember  that  this  conclusion  cannot 
be  reached  unless  a  priori,  and  quite  apart  from  the  observations 
in  question,  we  have  some  reason  for  thinking  that  there  may  be 
such  a  causal  connection  between  the  quantities.  The  argu 
ment  can  only  strengthen  a  pre-existing  presumption  ;  it  cannot 
create  one.  And  in  the  absence  of  reasons  peculiar  to  the 
particular  inquiry,  we  have  no  choice  but  to  fall  back  on  the 
general  methods  and  the  general  presumptions  of  induction. 

It  is  apparent  that,  where  the  correlation  argument  seems 
plausible,  some  tacit  assumption  must  have  slipped  in,  if  we  return 
to  the  case  where  our  correlation  table  relates  to  the  weights  of 
the  babies  and  their  Christian  names.  Either  by  accident  or 
because  we  had  arranged  the  order  of  the  Christian  names  to 
suit,  it  might  happen  with  a  particular  set  of  observations,  even 
a  fairly  numerous  set,  that  the  correlation  coefficient  was  large. 
Yet  on  that  evidence  alone  we  should  hardly  assert  a  generalisation 
connecting  the  weights  of  babies  with  their  Christian  names. 

The  truth  is  that  sensible  investigators  only  employ  the 
correlation  coefficient  to  test  or  confirm  conclusions  at  which 
they  have  arrived  on  other  grounds.  But  that  does  not  validate 
the  crude  way  in  which  the  argument  is  sometimes  presented, 
or  prevent  it  from  misleading  the  unwary, — since  not  all  investi 
gators  are  sensible. 

If  we  abandon  the  method  of  inverse  probability  in  favour  of 
the  less  precise  but  better  founded  processes  of  induction, 
'  quantitative  correlation,'  as  I  should  like  to  term  this  particular 
branch  of  statistical  induction,  is  more  complicated  than,  but  not 
theoretically  distinct  from,  the  kind  of  arguments  which  have 
occupied  the  earlier  paragraphs  of  this  chapter.  The  character 
of  the  additional  complication  can  be  described  by  saying  that 
we  are  presented  with  a  two-dimensional  problem  instead  of  a 
one-dimensional  problem.  The  mere  existence  of  a  particular 
correlation  coefficient  as  descriptive  of  a  group  of  observations, 
even  of  a  large  group,  is  not  in  itself  a  more  conclusive  or  significant 
argument  than  the  mere  existence  of  a  particular  frequency 
coefficient  would  be.  Of  course  if  we  have  a  considerable  body 
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of  pre-existing  knowledge  relevant  to  the  particular  inquiry,  the 
calculation  of  a  small  number  of  correlation  coefficients  may  be 
crucial.  But  otherwise  we  must  proceed  as  in  the  case  of  fre 
quency  coefficients  ;  that  is  to  say  we  must  have  before  us,  in 
order  to  found  a  satisfactory  argument,  many  sets  of  observa 
tions,  of  which  the  correlation  coefficients  display  a  significant 
stability  in  the  midst  of  variation  in  the  non-essential  class 
characteristics  (i.e.  those  class  characteristics  which  our  general 
isation  proposes  to  neglect)  of  the  different  sets  of  observations. 

10.  I  am  now  at  the  conclusion  of  an  inquiry  in  which, 
beginning  with  fundamental  questions  of  logic,!  have  endeavoured 
to  push  forward  to  the  analysis  of  some  of  the  actual  arguments 
which  impress  us  as  rational  in  the  progress  of  knowledge  and  the 
practice  of  empirical  science.  In  writing  a  book  of  this  kind  the 
author  must,  if  he  is  to  put  his  point  of  view  clearly,  pretend  some 
times  to  a  little  more  conviction  than  he  feels.  He  must  give 
his  own  argument  a  chance,  so  to  speak,  nor  be  too  ready  to 
depress  its  vitality  with  a  wet  cloud  of  doubt.  It  is  a  heavy  task 
to  write  on  these  problems  ;  and  the  reader  will  perhaps  excuse 
me  if  1  have  sometimes  pressed  on  a  little  faster  than  the  diffi 
culties  were  overcome,  and  with  decidedly  more  confidence  than 
I  have  always  felt. 

In  laying  the  foundations  of  the  subject  of  Probability,  1  have 
departed  a  good  deal  from  the  conception  of  it  which  governed 
the  minds  of  Laplace  and  Quetelet  and  has  dominated  through 
their  influence  the  thought  of  the  past  century, — though  I  believe 
that  Leibniz  and  Hume  might  have  read  what  I  have  written  with 
sympathy.  But  in  taking  leave  of  Probability,  I  should  like  to 
say  that,  in  my  judgment,  the  practical  usefulness  of  those  modes 
of  inference,  here  termed  Universal  and  Statistical  Induction, 
on  the  validity  of  which  the  boasted  knowledge  of  modern  science 
depends,  can  only  exist— and  I  do  not  now  pause  to  inquire 
again  whether  such  an  argument  must  be  circular— if  the  universe 
of  phenomena  does  in  fact  present  those  peculiar  characteristics 
of  atomism  aid  limited  variety  which  appear  more  and  more 
clearly  as  the  ultimate  result  to  which  material  science  is  tending  : 

fateare  ncccsscst 
materiem  quoqnc  finitis  diilcrre  figuris. 

The  physicists  of  tho  nineteenth  century  have  reduced  matter  to 
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the  collisions  and  arrangements  of  particles,  between  which  the 
ultimate  qualitative  differences  are  very  few  ;  and  the  Mendelian 
biologists  are  deriving  the  various  qualities  of  men  from  the 
collisions  and  arrangements  of  chromosomes.  In  both  cases  the 
analogy  with  the  perfect  game  of  chance  is  really  present ;  and 
the  validity  of  some  current  modes  of  inference  may  depend  on  the 
assumption  that  it  is  to  material  of  this  kind  that  we  are  applying 
them.  Here,  though  I  have  complained  sometimes  at  their  want 
of  logic,  I  am  in  fundamental  sympathy  with  the  deep  underlying 
conceptions  of  the  statistical  theory  of  the  day.  If  the  contem 
porary  doctrines  of  Biology  and  Physics  remain  tenable,  we  may 
have  a  remarkable,  if  undeserved,  justification  of  some  of  the 
methods  of  the  traditional  Calculus  of  Probabilities.  Professors 
of  probability  have  been  often  and  justly  derided  for  arguing  as 
if  nature  were  an  urn  containing  black  and  white  balls  in  fixed 
proportions.  Quetelet  once  declared  in  so  many  words — "  1'urne 
que  nous  interrogeons,  c'est  la  nature."  But  again  in  the 
history  of  science  the  methods  of  astrology  may  prove  useful  to 
the  astronomer  ;  and  it  may  turn  out  to  be  true — reversing 
Quetelet's  expression —that  "  La  nature  que  nous  interrogeons, 
c'est  une  urne." 
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INTRODUCTION 

There  is  no  opinion,  however  absurd  or  incredible,  which  has  not  been 
maintained  by  some  one  of  our  philosophers. — DESCARTES. 

THE  following  Bibliography  does  not  pretend  to  be  complete, 
but  it  contains  a  much  longer  list  of  what  has  been  written 
about  Probability  than  can  be  found  elsewhere.  I  have 
hesitated  a  little  before  burdening  this  volume  with  the  titles 
of  many  works,  so  few  of  which  are  still  valuable.  But  I  was 
myself  much  hampered,  when  first  I  embarked  on  the  studv  of 
this  subject,  by  the  absence  of  guide-posts  to  the  scattered  but 
extensive  literature  of  the  subject ;  and  a  list  which  I  drew  up 
for  my  own  convenience,  without  much  attention  to  biblio 
graphical  nicety  or  to  exact  uniformity  in  the  style  of  entry, 
may  be  useful  to  others. 

It  is  rather  an  arbitrary  matter  to  decide  what  to  include 
and  what  to  exclude.  Probability  overlaps  many  other  topics, 
and  some  of  the  most  important  references  to  it  are  to  be 
found  in  books,  the  main  topic  of  which  is  something  else.  On 
the  other  hand  it  would  be  absurd  to  include  every  casual 
reference  ;  and  no  useful  purpose  would  have  been  served  by 
cataloguing  the  very  numerous  volumes  dealing  with  Insurance, 
Games  of  Chance,  Statistics,  Errors  of  Observation,  and  Least 
Squares,  which  treat  in  detail  these  various  applications  of  the 
Theory  of  Probability.  It  has  been  a  matter  of  some  difficulty, 
therefore,  to  know  precisely  where  to  draw  the  line.  Where 
the  main  subject  of  a  book  or  paper  is  Probability  proper,  I 
have  included  it,  nearly  regardless  of  my  own  view  as  to  its 
importance,  and  have  not  attempted  to  act  as  censor ;  but 
where  Probability  is  not  the  main  subject  or  where  an  applica 
tion  of  Probability  is  concerned,  the  chief  interest  of  which  is 
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solely  in  the  application  itself,  I  have  only  included  the  entry 
where  I  think  it  important,  intrinsically  or  historically  or 
from  the  celebrity  of  the  author.  In  particular,  the  existence 
of  Professor  Mansfield  Merriman's  very  extensive  bibliography, 
published  in  the  Transactions  of  the  Connecticut  Academy  for 
1877,  has  made  it  possible  to  deal  very  lightly  (and  to  the 
extent  of  but  few  entries)  with  the  inordinately  large  literature 
of  Least  Squares.  This  list  comprises  408  titles  of  writings 
relating  to  the  Method  of  Least  Squares  and  the  theory  of 
accidental  errors  of  observation,  and  is  sufficiently  exhaustive 
so  far  as  relates  to  memoirs  on  this  topic  published  before 
1877. 

Of  bibliographical  sources  for  Probability  proper,  Tod- 
hunter's  History  of  the  Mathematical  Theory  of  Probability 
and  Laurent's  Calcul  des  probabilites  are  alone  important.  Of 
mathematical  works  published  before  the  time  of  Laplace, 
Todhunter's  list,  and  also  his  commentary  and  analysis,  are 
complete  and  exact, — a  work  of  true  learning,  beyond  criticism. 
The  bibliographical  catalogue  at  the  conclusion  of  Laurent's 
Calcul  (published  in  1873)  is  the  longest  list  published  hitherto 
of  general  works  on  Probability.  But  it  is  unduly  swollen  by 
the  inclusion  of  numerous  items  on  Insurance  and  Errors  of 
Observation,  the  bearing  of  which  on  Probability  is  very 
slight ; ]  it  is  chiefly  mathematical  in  bias ;  and  it  is  now 
nearly  fifty  years  old. 

I  have  not  read  all  these  books  myself,  but  I  have  read 
more  of  them  than  it  would  be  good  for  any  one  to  read  again. 
There  are  here  enumerated  many  dead  treatises  and  ghostly 
memoirs.  The  list  is  too  long,  and  I  have  not  always  success 
fully  resisted  the  impulse  to  add  to  it  in  the  spirit  of  a 
collector.  There  are  not  above  a  hundred  of  these  which  it 
would  be  worth  while  to  preserve, — if  only  it  were  securely 
ascertained  which  these  hundred  are.  At  present  a  biblio 
grapher  takes  pride  in  numerous  entries  ;  but  he  would  be  a 
more  useful  fellow,  and  the  labours  of  research  would  be 
lightened,  if  he  could  practise  deletion  and  bring  into  existence 
an  accredited  Index  Expurgatorius.  But  this  can  only  be 
accomplished  by  the  slow  mills  of  the  collective  judgment  of 

1   Laurent's  list  contains  310  titles,  of  which  1  have  excluded  174  from  my 
Jist  as  being  insufficiently  relevant. 
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the  learned  ;   and  I  have  already  indicated   my  own  favourite 
authors  in  copious  footnotes  to  the  main  body  of  the  text. 

The  list  is  long  ;  yet  there  is,  perhaps,  no  subject  of  equal 
importance  and  of  equal  fascination  to  men's  minds  on  which 
so  little  has  been  written.  It  is  now  fifty-live  years  since 
Dr.  Venn,  still  an  accustomed  figure  in  the  streets  and  courts 
of  Cambridge,  first  published  his  Logic  of  Chance  ;  yet  amongst 
systematic  works  in  the  English  language  on  the  logical  founda 
tions  of  Probability  my  Treatise  is  next  to  his  in  chronological 
order. 

The  student  will  find  many  famous  names  here  recorded. 
The  subject  has  preserved  its  mystery,  and  has  thus  attracted 
the  notice,  profound  or,  more  often,  casual,  of  most  speculative 
minds.  Leibniz,  Pascal,  Arnauld,  Huygens,  Spinoza,  Jacques 
and  Daniel  Bernoulli,  Hume,  D'Alembert,  Condorcet,  Euler, 
Laplace,  Poisson,  Cournot,  Quetelet,  Gauss,  Mill,  Boole, 
Tchebychef,  Lexis,  and  Poincare,  to  name  those  only  who  are 
dead,  are  catalogued  below. 
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Mag.  (4),  vol.  27,  1864. 
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happen  in  making  Observations."     The  Analyst  or  Math.  Museum    vol    1 

pp.  93-109,  1808. 

[This  paper,  which  contains  the  first  deduction  of  the  normal  law  of 

error,  was  partly  reprinted  by  Abbe  with  historical  notes  in  Anu-r    Journ 

Sci.  vol.  i.  pp.  411-41.5..  1871.] 
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Sm.  8vo.  London,  1738. 
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iodhunter's  Hiatory,  pp.  48-f>3.] 

"  An  Argument  for  Divine  Providence,  taken  from  the  constant  Regular 
ity  observ'd  m  the  Hirths  of  both  Sexes."  Phil.  Trans  vol  •»?  ,,  K(> 
190(1710-12). 

[Argues  that  the  oxr-ess  of  male  births  is  so  invariable,  that  wo  may  con 
clude  that  it  is  not  an  oven  chance  whether  a  male  or  female  be  born.J 
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and  Petersburg  Paradox,  316 

and  Bernoulli's  Theorem,  340  n. 
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method  of,  384 
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methods,  389,  406-417 
Universe  of  reference,  117,  129,  130 
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Variables  in  Probability, 58, 123,  412 n. 

Variety,  234 

and  induction,  219 
limitation  of,  258,  260,  427 

Venn,  84,  106  n,,  294  n. 


Venn  (contd.) — 

and  experience,  85 
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and  frequency  theory,  93  f. 

and  inverse  probability,  100 

and  Least  Squares,  206  n. 

and  induction,  273 

and  chance,  288 
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Weight,  of  evidence,  312 
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Whitehead,  and  frequency  theory,  101 

and  invalid  inference,  329  n. 
Whittaker,  E.  T.,  and  Rule  of  Suc 
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Wilbraham,  H.,  and  Boole,  167  n. 
Wolf  and  dice,  362 
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and  approximation,  161 

and  independence,  166 

and  '  statistics,'  327 
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0  False  and  treacherous  Probability, 
Enemy  of  truth,  and  friend  to  wickedncsse ; 
With  whoso  bleare  eyes  Opinion  learnes  to  see, 
Truth's  feeble  party  here,  and  barronnesse. 


THE    END 


Printed  by  R.  £  R.  CLARK,  LIMITED,  Edinburgh. 


BY    THE  SAME  AUTHOR 

Svo.      $s.  6d.  net. 

THE  ECONOMIC 
CONSEQUENCES  OF  THE  PEACE 
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as  fascinating  as  a  good  novel  :  it  has  all  the  merits — the  accuracy, 
the  method,  the  well-considered  arrangement  of  the  best  kind  ot 
State  1'aper,  with  none  of  the  shortcomings." 

LONDON:    MACMILLAN   &   CO.,   LTD. 


BY   THE   SAME  AUTHOR 


8;r#.      7-v.  6<l.  net. 


INDIAN 
CURRENCY  AND  FINANCE 


ECONOMIC  JOURNAL.— "The  book  is,  and  is  likely  long 
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